[gmx-users] replica exchange simulations performance issues.

Szilárd Páll pall.szilard at gmail.com
Tue Mar 31 13:54:08 CEST 2020


On Tue, Mar 31, 2020 at 1:45 AM Miro Astore <miro.astore at gmail.com> wrote:

> I got up to 25-26 ns/day with my 4 replica system  (same logic scaled
> up to 73 replicas) which I think is reasonable. Could I do better?
>

Hard to say without complete log file. Please share single run and multi
run log files.


>
> mpirun -np 48 gmx_mpi mdrun  -ntomp 1 -v -deffnm memb_prod1 -multidir
> 1 2 3 4 -replex 1000
>
>  I have tried following the manual but I don't think i'm going it
> right I keep getting errors. If you have a minute to suggest how I
> could do this I would appreciate that.
>

Again, the exact error messages and associated command line/log are
necessary to be able to give further suggestions.

--
Szilárd


>
> log file accounting:
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> On 12 MPI ranks Computing: Num Num Call Wall time Giga-Cycles Ranks
> Threads Count (s) total sum %
>
> -----------------------------------------------------------------------------
> Domain decomp. 12 1 26702 251.490 8731.137 1.5
> DD comm. load 12 1 25740 1.210 42.003 0.0 DD
> comm. bounds 12 1 26396 9.627 334.238 0.1
> Neighbor search 12 1 25862 283.564 9844.652 1.7
> Launch GPU ops. 12 1 5004002 343.309 11918.867 2.0
> Comm. coord. 12 1 2476139 508.526 17654.811 3.0 Force 12 1 2502001
> 419.341 14558.495 2.5
> Wait + Comm. F 12 1 2502001 347.752 12073.100 2.1
> PME mesh 12 1 2502001 11721.893 406955.915 69.2
> Wait Bonded GPU 12 1 2503 0.008 0.285 0.0
> Wait GPU NB nonloc. 12 1 2502001 48.918 1698.317 0.3
> Wait GPU NB local 12 1 2502001 19.475 676.141 0.1
> NB X/F buffer ops. 12 1 9956280 753.489 26159.337 4.5
> Write traj. 12 1 519 1.078 37.427 0.0 Update 12 1 2502001 434.272
> 15076.886 2.6
> Constraints 12 1 2502001 701.800 24364.800 4.1
> Comm. energies 12 1 125942 36.574 1269.776 0.2
> Rest 1047.855 36378.988 6.2
>
> -----------------------------------------------------------------------------
> Total 16930.182 587775.176 100.0
>
> -----------------------------------------------------------------------------
> Breakdown of PME mesh computation
>
> -----------------------------------------------------------------------------
> PME redist. X/F 12 1 5004002 1650.247 57292.604 9.7
> PME spread 12 1 2502001 4133.126 143492.183 24.4
> PME gather 12 1 2502001 2303.327 79965.968 13.6
> PME 3D-FFT 12 1 5004002 2119.410 73580.828 12.5
> PME 3D-FFT Comm. 12 1 5004002 918.318 31881.804 5.4
> PME solve Elec 12 1 2502001 584.446 20290.548 3.5
>
>  -----------------------------------------------------------------------------
>
> Best, Miro
>
> On Tue, Mar 31, 2020 at 9:58 AM Szilárd Páll <pall.szilard at gmail.com>
> wrote:
> >
> > On Sun, Mar 29, 2020 at 3:56 AM Miro Astore <miro.astore at gmail.com>
> wrote:
> >
> > > Hi everybody. I've been experimenting with REMD for my system running
> > > on 48 cores with 4 gpus (I will need to scale up to 73 replicas
> > > because this is a complicated system with many DOF I'm open to being
> > > told this is all a silly idea).
> > >
> >
> > It is a bad idea, you should have at least 1 physical core per replica
> and
> > with a large system ideally more.
> > However, if you are going for high efficiency (aggregate ns/day per
> phyical
> > node), always put at least 2 replicas per GPU.
> >
> >
> > >
> > > My run configuration is
> > > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11
> > > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000
> > >
> > > the best I can squeeze out of this is 9ns/day. In a non-replica
> > > simulation I can hit 50ns/day with a single GPU and 12 cores.
> > >
> >
> > That is abnormal and indicates that:
> > - either something is wrong with the hardware mapping / assignment in
> your
> > run or; do use simply "-pin on" and let mdrun manage threads pinning
> (that
> > map-by-numa is certainly not optimal); also I advise against tweaking the
> > thread count and using weird numbers like 11 (just use quarter);
> > - your exchange overhead is very high (check the communication cost in
> the
> > log)
> >
> > If you share some log files of a standalone and a replex run, we can
> advise
> > where the performance loss comes from.
> >
> > Cheers,
> > --
> > Szilárd
> >
> > Looking at my accounting, for a single replica 52% of time is being
> > > spent on the "Force" category with 92% of my Mflops going into NxN
> > > Ewald Elec. + LJ [F]
> > >
> >
> > > I'm wondering what I could do to reduce this bottle neck if anything.
> > >
> > > Thank you.
> > > --
> > > Miro A. Astore   (he/him)
> > > PhD Candidate | Computational Biophysics
> > > Office 434 A28 School of Physics
> > > University of Sydney
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
>
>
> --
> Miro A. Astore   (he/him)
> PhD Candidate | Computational Biophysics
> Office 434 A28 School of Physics
> University of Sydney
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list