[gmx-users] replica exchange simulations performance issues.

Miro Astore miro.astore at gmail.com
Tue Mar 31 01:44:38 CEST 2020


I got up to 25-26 ns/day with my 4 replica system  (same logic scaled
up to 73 replicas) which I think is reasonable. Could I do better?

mpirun -np 48 gmx_mpi mdrun  -ntomp 1 -v -deffnm memb_prod1 -multidir
1 2 3 4 -replex 1000

 I have tried following the manual but I don't think i'm going it
right I keep getting errors. If you have a minute to suggest how I
could do this I would appreciate that.

log file accounting:
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 12 MPI ranks Computing: Num Num Call Wall time Giga-Cycles Ranks
Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 12 1 26702 251.490 8731.137 1.5
DD comm. load 12 1 25740 1.210 42.003 0.0 DD
comm. bounds 12 1 26396 9.627 334.238 0.1
Neighbor search 12 1 25862 283.564 9844.652 1.7
Launch GPU ops. 12 1 5004002 343.309 11918.867 2.0
Comm. coord. 12 1 2476139 508.526 17654.811 3.0 Force 12 1 2502001
419.341 14558.495 2.5
Wait + Comm. F 12 1 2502001 347.752 12073.100 2.1
PME mesh 12 1 2502001 11721.893 406955.915 69.2
Wait Bonded GPU 12 1 2503 0.008 0.285 0.0
Wait GPU NB nonloc. 12 1 2502001 48.918 1698.317 0.3
Wait GPU NB local 12 1 2502001 19.475 676.141 0.1
NB X/F buffer ops. 12 1 9956280 753.489 26159.337 4.5
Write traj. 12 1 519 1.078 37.427 0.0 Update 12 1 2502001 434.272 15076.886 2.6
Constraints 12 1 2502001 701.800 24364.800 4.1
Comm. energies 12 1 125942 36.574 1269.776 0.2
Rest 1047.855 36378.988 6.2
-----------------------------------------------------------------------------
Total 16930.182 587775.176 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 12 1 5004002 1650.247 57292.604 9.7
PME spread 12 1 2502001 4133.126 143492.183 24.4
PME gather 12 1 2502001 2303.327 79965.968 13.6
PME 3D-FFT 12 1 5004002 2119.410 73580.828 12.5
PME 3D-FFT Comm. 12 1 5004002 918.318 31881.804 5.4
PME solve Elec 12 1 2502001 584.446 20290.548 3.5
 -----------------------------------------------------------------------------

Best, Miro

On Tue, Mar 31, 2020 at 9:58 AM Szilárd Páll <pall.szilard at gmail.com> wrote:
>
> On Sun, Mar 29, 2020 at 3:56 AM Miro Astore <miro.astore at gmail.com> wrote:
>
> > Hi everybody. I've been experimenting with REMD for my system running
> > on 48 cores with 4 gpus (I will need to scale up to 73 replicas
> > because this is a complicated system with many DOF I'm open to being
> > told this is all a silly idea).
> >
>
> It is a bad idea, you should have at least 1 physical core per replica and
> with a large system ideally more.
> However, if you are going for high efficiency (aggregate ns/day per phyical
> node), always put at least 2 replicas per GPU.
>
>
> >
> > My run configuration is
> > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11
> > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000
> >
> > the best I can squeeze out of this is 9ns/day. In a non-replica
> > simulation I can hit 50ns/day with a single GPU and 12 cores.
> >
>
> That is abnormal and indicates that:
> - either something is wrong with the hardware mapping / assignment in your
> run or; do use simply "-pin on" and let mdrun manage threads pinning (that
> map-by-numa is certainly not optimal); also I advise against tweaking the
> thread count and using weird numbers like 11 (just use quarter);
> - your exchange overhead is very high (check the communication cost in the
> log)
>
> If you share some log files of a standalone and a replex run, we can advise
> where the performance loss comes from.
>
> Cheers,
> --
> Szilárd
>
> Looking at my accounting, for a single replica 52% of time is being
> > spent on the "Force" category with 92% of my Mflops going into NxN
> > Ewald Elec. + LJ [F]
> >
>
> > I'm wondering what I could do to reduce this bottle neck if anything.
> >
> > Thank you.
> > --
> > Miro A. Astore   (he/him)
> > PhD Candidate | Computational Biophysics
> > Office 434 A28 School of Physics
> > University of Sydney
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.



-- 
Miro A. Astore   (he/him)
PhD Candidate | Computational Biophysics
Office 434 A28 School of Physics
University of Sydney


More information about the gromacs.org_gmx-users mailing list