[gmx-users] Running Gromacs in parallel

jkrieger at mrc-lmb.cam.ac.uk jkrieger at mrc-lmb.cam.ac.uk
Wed Sep 21 11:36:03 CEST 2016


I wonder whether what I see that -np 108 and -ntomp 2 is best comes from
using -multi 6 with 8-CPU nodes. That level of parallelism may then be
necessary to trigger automatic segregation of PP and PME ranks. I'm not
sure if I tried -np 54 and -ntomp 4, which would probably also do it. I
compared mostly on 196 CPUs then found going up to 216 was better than 196
with -ntomp 2 and pure MPI (-ntomp 1) was considerably worse for both.
Would people recommend to go back to 196 which allows 4 whole nodes per
replica and playing with -npme and -ntomp_pme?

> Hi Thanh Le,
>
> Assuming all the nodes are the same (9 nodes with 12 CPUs) then you could
> try the following
>
> mpirun -np 9 --map-by node mdrun -ntomp 12 ...
> mpirun -np 18 mdrun -ntomp 6 ...
> mpirun -np 54 mdrun -ntomp 2 ...
>
> Which of these works best will depend on your setup.
>
> Using the whole cluster for one job may not be the most efficient way. I
> found on our cluster that once I reach 216 CPUs (equivalent settings from
> the queuing system to -np 108 and -ntomp 2), I can't do better by adding
> more nodes (where presumably communication becomes an issue). In addition
> to running -multi or -multidir jobs, which takes the load off
> communication a bit, it may also be worth having separate jobs and using
> -pin on and -pinoffset.
>
> Best wishes
> James
>
>> Hi everyone,
>> I have a question concerning running gromacs in parallel. I have read
>> over
>> the
>> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
>> <http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html>
>> but I still dont quite understand how to run it efficiently.
>> My gromacs version is 4.5.4
>> The cluster I am using has CPUs total: 108 and 4 hosts up.
>> The node iam using:
>> Architecture:          x86_64
>> CPU op-mode(s):        32-bit, 64-bit
>> Byte Order:            Little Endian
>> CPU(s):                12
>> On-line CPU(s) list:   0-11
>> Thread(s) per core:    2
>> Core(s) per socket:    6
>> Socket(s):             1
>> NUMA node(s):          1
>> Vendor ID:             AuthenticAMD
>> CPU family:            21
>> Model:                 2
>> Stepping:              0
>> CPU MHz:               1400.000
>> BogoMIPS:              5200.57
>> Virtualization:        AMD-V
>> L1d cache:             16K
>> L1i cache:             64K
>> L2 cache:              2048K
>> L3 cache:              6144K
>> NUMA node0 CPU(s):     0-11
>> MPI is already installed. I also have permission to use the cluster as
>> much as I can.
>> My question is: how should I write my mdrun command run to utilize all
>> the
>> possible cores and nodes?
>> Thanks,
>> Thanh Le
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send
>> a mail to gmx-users-request at gromacs.org.
>>
>




More information about the gromacs.org_gmx-users mailing list