[gmx-users] CPU running doesn't match command line

Mark Abraham mark.j.abraham at gmail.com
Tue Aug 23 18:38:57 CEST 2016


How did you decide that only 15 cores were being used? What performance did
you observe with only one of the jobs running, vs the performance of both
of them while both are running? Please share log files via links to files
on a file sharing service - it's quite tedious and inefficient if we have
to guess based on incomplete information.


On Mon, Aug 22, 2016 at 5:37 PM Albert <mailmd2011 at gmail.com> wrote:

> Hello Mark:
> I've recompiled Gromacs without MPI. I run submit the job with the
> command line you suggested.
> gmx mdrun -ntomp 10 -v -g test.log -pin on -pinoffset 0 -gpu_id 0 -s
> test.tpr >& test.info
> gmx mdrun -ntomp 10 -v -g test.log -pin on -pinoffset 10 -gpu_id 1 -s
> test.tpr >& test.info
> I specified 20 cores CPU in all, but I noticed that only 15 cores were
> actually being used. I am pretty confused for that.
> Here is my log file:
> GROMACS:      gmx mdrun, VERSION 5.1.3
> Executable:   /soft/gromacs/5.1.3_intel-thread/bin/gmx
> Data prefix:  /soft/gromacs/5.1.3_intel-thread
> Command line:
>    gmx mdrun -ntomp 10 -v -g test.log -pin on -pinoffset 0 -gpu_id 0 -s
> test.tpr
> Hardware detected:
>    CPU info:
>      Vendor: GenuineIntel
>      Brand:  Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
>      SIMD instructions most likely to fit this hardware: AVX_256
>      SIMD instructions selected at GROMACS compile time: AVX_256
>    GPU info:
>      Number of GPUs detected: 2
>      #0: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat:
> compatible
>      #1: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat:
> compatible
> Reading file test.tpr, VERSION 5.1.3 (single precision)
> Using 1 MPI thread
> Using 10 OpenMP threads
> 1 GPU user-selected for this run.
> Mapping of GPU ID to the 1 PP rank in this node: 0
> starting mdrun 'Title'
> 5000000 steps,  10000.0 ps.
> step   80: timed with pme grid 60 60 96, coulomb cutoff 1.000: 1634.1
> M-cycles
> step  160: timed with pme grid 56 56 84, coulomb cutoff 1.047: 1175.4
> M-cycles
> GROMACS:      gmx mdrun, VERSION 5.1.3
> Executable:   /soft/gromacs/5.1.3_intel-thread/bin/gmx
> Data prefix:  /soft/gromacs/5.1.3_intel-thread
> Command line:
>    gmx mdrun -ntomp 10 -v -g test.log -pin on -pinoffset 10 -gpu_id 1 -s
> test.tpr
> Running on 1 node with total 10 cores, 20 logical cores, 2 compatible GPUs
> Hardware detected:
>    CPU info:
>      Vendor: GenuineIntel
>      Brand:  Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
>      SIMD instructions most likely to fit this hardware: AVX_256
>      SIMD instructions selected at GROMACS compile time: AVX_256
>    GPU info:
>      Number of GPUs detected: 2
>      #0: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat:
> compatible
>      #1: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat:
> compatible
> Reading file test.tpr, VERSION 5.1.3 (single precision)
> Using 1 MPI thread
> Using 10 OpenMP threads
> 1 GPU user-selected for this run.
> Mapping of GPU ID to the 1 PP rank in this node: 1
> Applying core pinning offset 10
> starting mdrun 'Title'
> 5000000 steps,  10000.0 ps.
> step   80: timed with pme grid 60 60 84, coulomb cutoff 1.000: 657.2
> M-cycles
> step  160: timed with pme grid 52 52 80, coulomb cutoff 1.096: 622.8
> M-cycles
> step  240: timed with pme grid 48 48 72, coulomb cutoff 1.187: 593.9
> M-cycles
> On 08/18/2016 02:13 PM, Mark Abraham wrote:
> > Hi,
> >
> > It's a bit curious to want to run two 8-thread jobs on a machine with 10
> > physical cores because you'll get lots of performance imbalance because
> > some threads must share the same physical core, but I guess it's a free
> > world. As I suggested the other day,
> >
> http://manual.gromacs.org/documentation/2016/user-guide/mdrun-performance.html#examples-for-mdrun-on-one-node
> > has
> > some examples. The fact you've compiled and linked with an MPI library
> > means it may be involving itself in the thread-affinity management, but
> > whether it is doing that is something between you, it, the docs and the
> > cluster admins. If you're just wanting to run on a single node, do
> yourself
> > a favour and build the thread-MPI flavour.
> >
> > If so, you probably want more like
> > gmx mdrun -ntomp 10 -pin on -pinoffset 0 -gpu_id 0 -s run1
> > gmx mdrun -ntomp 10 -pin on -pinoffset 10 -gpu_id 1 -s run2
> >
> > If you want to use the MPI build, then I suggest you read up on how its
> > mpirun will let you manage keeping the threads of processes where you
> want
> > them (ie apart).
> >
> > Mark
> --
> Gromacs Users mailing list
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.

More information about the gromacs.org_gmx-users mailing list