[gmx-users] Dual Xeon workstation - starting one job/cpu

Mark Abraham mark.j.abraham at gmail.com
Wed Mar 23 10:25:08 CET 2016


Hi,

You should take a look at
http://onlinelibrary.wiley.com/doi/10.1002/jcc.24030/abstract for
background information.

On Wed, Mar 23, 2016 at 6:28 AM Jernej Zidar <jernej.zidar at gmail.com> wrote:

> Hi,
>   Recently I received a dual Xeon ( 2x E5-2630v3) workstation with two
> Tesla K40c cards.
>
>   I'm now trying to figure out how to start one job/cpu.
>

The easy answer if you're running multiple identical simulations is to use
gmx mdrun -multism (see
http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-features.html#running-multi-simulations
)

  For CPU1, this is not a problem:
> gmx mdrun -v -deffnm test -gpu_id 0000 -ntomp 4 -pin on -maxh 0.15 -nsteps
> -1
> #above starts a job on CPU1 with 4 MPI tasks and 4 OpenMP tthreads
>

How are you observing that? By default mdrun will spread those 16 threads
over your 16 real cores.

  Fro CPU2, this is a bit more problematic because neither "-pinoffset" or
> "-pinstride" make no difference. The second job appears to always start on
> the CPU2.
>

I don't understand that, unless your machine is somehow observing that
there are idle cores and placing the threads there.


>   I had some success running a job like this:
> gmx mdrun -v -deffnm test2 -gpu_id 1111 -ntomp 4 -maxh 0.15 -nsteps -1
> #above starts a job on CPU2 but without CPU pinning performance is poor
>
>   How to overcome this? From the tests I've done so fare it is actually to
> run two jobs that occupy one CPU each rather that a job that would occupy
> both CPUs at the same time.
>   Is this even possible without resorting to OpenMPI?
>

Sure.

gmx mdrun -gpu_id 0000 -pin on -pinoffset 0 -pinstride 1 -ntomp 4
gmx mdrun -gpu_id 1111 -pin on -pinoffset 16 -pinstride 1 -ntomp 4

but this guarantees the GPUs lie idle during the update and constraint
phases. I expect that

gmx mdrun -gpu_id 0011 -pin on -pinoffset 0 -pinstride 1 -ntomp 4
gmx mdrun -gpu_id 0011 -pin on -pinoffset 16 -pinstride 1 -ntomp 4

will give you better throughput, because the GPU tasks in the two mdruns
will naturally run out of phase with each other, leading to higher
utilization of each individual GPU.

Mark

Thanks,
> Jernej Zidar
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list