[gmx-users] Running Gromacs in parallel

Szilárd Páll pall.szilard at gmail.com
Wed Sep 21 14:31:14 CEST 2016


Performance tuning is highly dependent on the simulation system and
the hardware you're running on. Questions like the ones you pose are
impossible to answer meaningfully without *full* log files (and
hardware specs including network).

Have you checked the performance checklist I linked above?
--
Szilárd


On Wed, Sep 21, 2016 at 11:36 AM,  <jkrieger at mrc-lmb.cam.ac.uk> wrote:
> I wonder whether what I see that -np 108 and -ntomp 2 is best comes from
> using -multi 6 with 8-CPU nodes. That level of parallelism may then be
> necessary to trigger automatic segregation of PP and PME ranks. I'm not
> sure if I tried -np 54 and -ntomp 4, which would probably also do it. I
> compared mostly on 196 CPUs then found going up to 216 was better than 196
> with -ntomp 2 and pure MPI (-ntomp 1) was considerably worse for both.
> Would people recommend to go back to 196 which allows 4 whole nodes per
> replica and playing with -npme and -ntomp_pme?
>
>> Hi Thanh Le,
>>
>> Assuming all the nodes are the same (9 nodes with 12 CPUs) then you could
>> try the following
>>
>> mpirun -np 9 --map-by node mdrun -ntomp 12 ...
>> mpirun -np 18 mdrun -ntomp 6 ...
>> mpirun -np 54 mdrun -ntomp 2 ...
>>
>> Which of these works best will depend on your setup.
>>
>> Using the whole cluster for one job may not be the most efficient way. I
>> found on our cluster that once I reach 216 CPUs (equivalent settings from
>> the queuing system to -np 108 and -ntomp 2), I can't do better by adding
>> more nodes (where presumably communication becomes an issue). In addition
>> to running -multi or -multidir jobs, which takes the load off
>> communication a bit, it may also be worth having separate jobs and using
>> -pin on and -pinoffset.
>>
>> Best wishes
>> James
>>
>>> Hi everyone,
>>> I have a question concerning running gromacs in parallel. I have read
>>> over
>>> the
>>> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
>>> <http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html>
>>> but I still dont quite understand how to run it efficiently.
>>> My gromacs version is 4.5.4
>>> The cluster I am using has CPUs total: 108 and 4 hosts up.
>>> The node iam using:
>>> Architecture:          x86_64
>>> CPU op-mode(s):        32-bit, 64-bit
>>> Byte Order:            Little Endian
>>> CPU(s):                12
>>> On-line CPU(s) list:   0-11
>>> Thread(s) per core:    2
>>> Core(s) per socket:    6
>>> Socket(s):             1
>>> NUMA node(s):          1
>>> Vendor ID:             AuthenticAMD
>>> CPU family:            21
>>> Model:                 2
>>> Stepping:              0
>>> CPU MHz:               1400.000
>>> BogoMIPS:              5200.57
>>> Virtualization:        AMD-V
>>> L1d cache:             16K
>>> L1i cache:             64K
>>> L2 cache:              2048K
>>> L3 cache:              6144K
>>> NUMA node0 CPU(s):     0-11
>>> MPI is already installed. I also have permission to use the cluster as
>>> much as I can.
>>> My question is: how should I write my mdrun command run to utilize all
>>> the
>>> possible cores and nodes?
>>> Thanks,
>>> Thanh Le
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send
>>> a mail to gmx-users-request at gromacs.org.
>>>
>>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list