[gmx-users] Dual Xeon workstation - starting one job/cpu

Mark Abraham mark.j.abraham at gmail.com
Wed Mar 23 10:25:08 CET 2016


You should take a look at
http://onlinelibrary.wiley.com/doi/10.1002/jcc.24030/abstract for
background information.

On Wed, Mar 23, 2016 at 6:28 AM Jernej Zidar <jernej.zidar at gmail.com> wrote:

> Hi,
>   Recently I received a dual Xeon ( 2x E5-2630v3) workstation with two
> Tesla K40c cards.
>   I'm now trying to figure out how to start one job/cpu.

The easy answer if you're running multiple identical simulations is to use
gmx mdrun -multism

  For CPU1, this is not a problem:
> gmx mdrun -v -deffnm test -gpu_id 0000 -ntomp 4 -pin on -maxh 0.15 -nsteps
> -1
> #above starts a job on CPU1 with 4 MPI tasks and 4 OpenMP tthreads

How are you observing that? By default mdrun will spread those 16 threads
over your 16 real cores.

  Fro CPU2, this is a bit more problematic because neither "-pinoffset" or
> "-pinstride" make no difference. The second job appears to always start on
> the CPU2.

I don't understand that, unless your machine is somehow observing that
there are idle cores and placing the threads there.

>   I had some success running a job like this:
> gmx mdrun -v -deffnm test2 -gpu_id 1111 -ntomp 4 -maxh 0.15 -nsteps -1
> #above starts a job on CPU2 but without CPU pinning performance is poor
>   How to overcome this? From the tests I've done so fare it is actually to
> run two jobs that occupy one CPU each rather that a job that would occupy
> both CPUs at the same time.
>   Is this even possible without resorting to OpenMPI?


gmx mdrun -gpu_id 0000 -pin on -pinoffset 0 -pinstride 1 -ntomp 4
gmx mdrun -gpu_id 1111 -pin on -pinoffset 16 -pinstride 1 -ntomp 4

but this guarantees the GPUs lie idle during the update and constraint
phases. I expect that

gmx mdrun -gpu_id 0011 -pin on -pinoffset 0 -pinstride 1 -ntomp 4
gmx mdrun -gpu_id 0011 -pin on -pinoffset 16 -pinstride 1 -ntomp 4

will give you better throughput, because the GPU tasks in the two mdruns
will naturally run out of phase with each other, leading to higher
utilization of each individual GPU.


