[gmx-users] CPU running doesn't match command line

Wed Aug 24 01:03:06 CEST 2016

On Mon, Aug 22, 2016 at 5:36 PM, Albert <mailmd2011 at gmail.com> wrote:
> Hello Mark:
>
> I've recompiled Gromacs without MPI. I run submit the job with the command
> line you suggested.
>
> gmx mdrun -ntomp 10 -v -g test.log -pin on -pinoffset 0 -gpu_id 0 -s
> test.tpr >& test.info
> gmx mdrun -ntomp 10 -v -g test.log -pin on -pinoffset 10 -gpu_id 1 -s
> test.tpr >& test.info

You need to pass a "-pinstride 1" there -- at least to the first
launch, but best if you do it for both (full explanation below).

You may want to consider using mdrun -multi which will automatically
set up the correct pinning for all its sub-simulations (it can even be
used across multiple nodes with any network to run a set of
simulations); note that with 5.x I recohencemmend it only if the two
runs are expected to run with the same speed (as they are sync-ed up
even if they don't communicate), but this should be greatly improved
in the '16 release.

Because the first mdrun will assume that it will run alone on the node
and it will not make use of Hyperthreading but rather use all physical
cores first. Hence, when started with only 10 threads on a 10 core /
20 threads CPU, it will bind its threads to the 10 different cores
rather 10 threads of the first 5 cores. The second run will not make
this assumption anymore (due to non-zero offsetting correctly
interpreted as a possible hint that multiple jobs are run
side-by-side); hence, this run use the last 5 cores with 2 threads
each. Consequently, in total you'll have 20 threads running, 5 of them
pinned to the same 5 hardware threads => 15 hardware threads used =>
top shows 7.5 "CPUs" used.

Cheers
--
Szilárd

PS: In this discussion thread alone I've asked for *complete* log
files twice. I'd really appreciate if

>
> I specified 20 cores CPU in all, but I noticed that only 15 cores were
> actually being used. I am pretty confused for that.
>
>
> Here is my log file:
>
>
> GROMACS:      gmx mdrun, VERSION 5.1.3
> Executable:   /soft/gromacs/5.1.3_intel-thread/bin/gmx
> Data prefix:  /soft/gromacs/5.1.3_intel-threadgmx mdrun -quiet -v -ntomp 6 -gpu_id 0 -nsteps -1 -pin on
> Command line:
>   gmx mdrun -ntomp 10 -v -g test.log -pin on -pinoffset 0 -gpu_id 0 -s
> test.tpr
> Hardware detected:
>   CPU info:
>     Vendor: GenuineIntel
>     Brand:  Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
>     SIMD instructions most likely to fit this hardware: AVX_256
>     SIMD instructions selected at GROMACS compile time: AVX_256
>   GPU info:
>     Number of GPUs detected: 2
>     #0: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat:
> compatible
>     #1: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat:
> compatible
>
> Reading file test.tpr, VERSION 5.1.3 (single precision)
> Using 1 MPI thread
> Using 10 OpenMP threads
>
> 1 GPU user-selected for this run.
> Mapping of GPU ID to the 1 PP rank in this node: 0
>
> starting mdrun 'Title'
> 5000000 steps,  10000.0 ps.
> step   80: timed with pme grid 60 60 96, coulomb cutoff 1.000: 1634.1
> M-cycles
> step  160: timed with pme grid 56 56 84, coulomb cutoff 1.047: 1175.4
> M-cycles
>
>
>
>
> GROMACS:      gmx mdrun, VERSION 5.1.3
> Executable:   /soft/gromacs/5.1.3_intel-thread/bin/gmx
> Data prefix:  /soft/gromacs/5.1.3_intel-thread
> Command line:
>   gmx mdrun -ntomp 10 -v -g test.log -pin on -pinoffset 10 -gpu_id 1 -s
> test.tpr
>
> Running on 1 node with total 10 cores, 20 logical cores, 2 compatible GPUs
> Hardware detected:
>   CPU info:
>     Vendor: GenuineIntel
>     Brand:  Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
>     SIMD instructions most likely to fit this hardware: AVX_256
>     SIMD instructions selected at GROMACS compile time: AVX_256
>   GPU info:
>     Number of GPUs detected: 2
>     #0: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat:
> compatible
>     #1: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat:
> compatible
>
> Reading file test.tpr, VERSION 5.1.3 (single precision)
> Using 1 MPI thread
> Using 10 OpenMP threads
>
> 1 GPU user-selected for this run.
> Mapping of GPU ID to the 1 PP rank in this node: 1
>
> Applying core pinning offset 10
> starting mdrun 'Title'
> 5000000 steps,  10000.0 ps.
> step   80: timed with pme grid 60 60 84, coulomb cutoff 1.000: 657.2
> M-cycles
> step  160: timed with pme grid 52 52 80, coulomb cutoff 1.096: 622.8
> M-cycles
> step  240: timed with pme grid 48 48 72, coulomb cutoff 1.187: 593.9
> M-cycles
>
>
>
>
>
> On 08/18/2016 02:13 PM, Mark Abraham wrote:
>>
>> Hi,
>>
>> It's a bit curious to want to run two 8-thread jobs on a machine with 10
>> physical cores because you'll get lots of performance imbalance because
>> some threads must share the same physical core, but I guess it's a free
>> world. As I suggested the other day,
>>
>> http://manual.gromacs.org/documentation/2016/user-guide/mdrun-performance.html#examples-for-mdrun-on-one-node
>> has
>> some examples. The fact you've compiled and linked with an MPI library
>> means it may be involving itself in the thread-affinity management, but
>> whether it is doing that is something between you, it, the docs and the
>> cluster admins. If you're just wanting to run on a single node, do
>> yourself
>> a favour and build the thread-MPI flavour.
>>
>> If so, you probably want more like
>> gmx mdrun -ntomp 10 -pin on -pinoffset 0 -gpu_id 0 -s run1
>> gmx mdrun -ntomp 10 -pin on -pinoffset 10 -gpu_id 1 -s run2
>>
>> If you want to use the MPI build, then I suggest you read up on how its
>> mpirun will let you manage keeping the threads of processes where you want
>> them (ie apart).
>>
>> Mark
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
> mail to gmx-users-request at gromacs.org.