[gmx-users] REMD slow's down drastically

Tue Feb 25 03:40:23 CET 2014

Thank you Mark for the tips about the pinoffset. I'll try it and see if it affects the speed at all.

Regarding the utility of hyperthreading, running on a cluster in which each node has 8 Nehalem processing cores, I have seen 5-15% speedup from using hyperthreading via 16 threads vs. using only 8 threads (in non-MPI gromacs). This is across about 10 simulations systems that I have worked on in the last four years. In all these cases, I am using -npme 0. However, once multiple nodes and IB fabric get involved, then hyperthreading gives no benefit and generally degrades the performance. Perhaps there are some other things getting involved here, but the only change I make is mdrun -nt 8 or mdrun -nt 16 and I see a speedup from -nt 16. System sizes range from 10K to 250K atoms. Note that I have never tried using the hyperthreading with REMD or any other fancy setup.

Chris.
________________________________________
From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Mark Abraham <mark.j.abraham at gmail.com>
Sent: 24 February 2014 21:26
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] REMD slow's down drastically

On Feb 24, 2014 11:01 PM, "Christopher Neale" <chris.neale at alum.utoronto.ca>
wrote:
>
> Presuming that you have indeed set up the number of processors correctly
(should be running on a different number of cored for different number of
replicas to do a fair test), could it be a thread pinning issue?

Yes, but part of the larger problem of over-loading the physical cores.

> I run on a Nehalem system with 8 cores/node but, because of the Nehalem
hyperthreading (I think), gromacs always complains if I run "mpirun -np $N
mdrun" where $N is the number of cores
>
> NOTE: The number of threads is not equal to the number of (logical) cores
>       and the -pin option is set to auto: will not pin thread to cores.
>       This can lead to significant performance degradation.
>       Consider using -pin on (and -pinoffset in case you run multiple
jobs).
>
> However, if I use $N = 2 times the number of cores, then I don't get that
note, instead getting:
>
> "Pinning threads with a logical core stride of 1"
>
> Aside, if anybody has a suggestion about how I should handle the thread
pinning in my case, or if it matters, then I would be happy to hear it (my
throughput seems to be good though).

Hyper-threading is good for applications that are memory- or user-bound (so
enabled by default on consumer machines), so they can take advantage of CPU
instruction-issue opportunities while stalled. GROMACS kernels are already
CPU-bound, so there is little to gain and it generally does not pay for the
overhead. Generally, one should not use HT; turning it off can be emulated
with the right use of -pinoffset and using half the number of threads.

> Finally, this comment is off topic, but you might want to reconsider
having the CL ions in a separate temperature coupling group.

Indeed.

Mark

> Chris.
> ________________________________________
> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Singam
Karthick <sikart21 at yahoo.in>
> Sent: 24 February 2014 02:32
> To: gromacs.org_gmx-users at maillist.sys.kth.se
> Subject: [gmx-users] REMD slow's down drastically
>
> Dear members,
> I am trying to run REMD simulation for poly Alanine (12 residue) system.
I used remd generator to get the range of temperature with the exchange
probability of 0.3. I was getting the 125 replicas. I tried to simulate 125
replicas its drastically slow down the simulation time (for 70 pico seconds
it took around 17 hours ) could anyone please tell me how to solve this
issue.
>
> Following is the MDP file
>
> title           = G4Ga3a4a5 production.
> ;define         = ;-DPOSRES     ; position restrain the protein
> ; Run parameters
> integrator      = md            ; leap-frog integrator
> nsteps          = 12500000      ; 2 * 5000000 = 3ns
> dt              = 0.002         ; 2 fs
> ; Output control
> nstxout         = 0             ; save coordinates every 0.2 ps
> nstvout         = 10000         ; save velocities every 0.2 ps
> nstxtcout       = 500           ; save xtc coordinate every 0.2 ps
> nstenergy       = 500           ; save energies every 0.2 ps
> nstlog          = 100           ; update log file every 0.2 ps
> ; Bond parameters
> continuation    = yes           ; Restarting after NVT
> constraint_algorithm = lincs    ; holonomic constraints
> constraints     = hbonds        ; all bonds (even heavy atom-H bonds)
constrained
> lincs_iter      = 1             ; accuracy of LINCS
> lincs_order     = 4             ; also related to accuracy
> morse           = no
> ; Neighborsearching
> ns_type         = grid          ; search neighboring grid cels
> nstlist         = 5             ; 10 fs
> rlist           = 1.0           ; short-range neighborlist cutoff (in nm)
> rcoulomb        = 1.0           ; short-range electrostatic cutoff (in nm)
> rvdw            = 1.0           ; short-range van der Waals cutoff (in nm)
> ; Electrostatics
> coulombtype     = PME           ; Particle Mesh Ewald for long-range
electrostatics
> pme_order       = 4             ; cubic interpolation
> fourierspacing  = 0.16          ; grid spacing for FFT
> ; Temperature coupling is on
> tcoupl          = V-rescale     ; modified Berendsen thermostat
> tc-grps         =  protein SOL Cl       ;two coupling groups - more
accurate
> tau_t                 = 0.1 0.1  0.1 ; time constant, in ps
> ref_t                 = XXXXX  XXXXX  XXXXX    ; reference temperature,
one for each group, in K
> ; Pressure coupling is on
> pcoupl          = Parrinello-Rahman     ; Pressure coupling on in NPT
> pcoupltype      = isotropic     ; uniform scaling of box vectors
> tau_p           = 2.0           ; time constant, in ps
> ref_p           = 1.0           ; reference pressure, in bar
> compressibility = 4.5e-5        ; isothermal compressibility of water,
bar^-1
> ; Periodic boundary conditions
> pbc             = xyz           ; 3-D PBC
> ; Dispersion correction
>
> DispCorr        = EnerPres      ; account for cut-off vdW scheme
>
>
> regards
> singam
> --
> Gromacs Users mailing list
>
> * Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-request at gromacs.org.
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.