[gmx-users] Testing the performance of
Mark Abraham
mark.j.abraham at gmail.com
Wed Jul 3 10:39:14 CEST 2013
On Wed, Jul 3, 2013 at 10:17 AM, Richa Singh
<richa.s.rathorerr at gmail.com> wrote:
> Hello,
>
> I want some inputs on observed performance of protein-in-water
> simulations run on a single server with 2 Intel Xeon E5-2660
> processors. Each processor has 8C/16T, so total 16C/32T. My system is
> composed of 1643 beta-microglobulin atoms and 24699 water atoms.
>
>
> I ran several 500ps simulations with either 1, 2, or 4 simultaneous
> simulations on one server. Performance using the verlet and group
> schemes are reported below:
>
>
> For Verlet:
>
> mdrun -ntomp N -deffnm file where N is the no. of threads
>
> for Single simulation, 32 ns/day (N= 32 threads)
> Two simultaneous simulations, 4 + 0.25 = 4.25ns/day ( with N= 16
> threads each)
> Four simultaneous simulations, 3.36+0.11+0.11+0.10=3.68ns/day.
> (N=8 threads each)
Very likely there are some pinning problems here. The output should be
warning you that mdrun has noticed that you are using a subset of your
machine for each simulation, and so it makes no assumptions about how
you'd like to use it. You either need to pin manually (see mdrun -h),
or consider using mdrun -multi so that mdrun can know it is reasonable
to assume it can pin to the whole node.
> mdrun -nt N -deffnm file where N is the no. of threads
>
> this gave me different results!
>
> for Single simulation, 28 ns/day (N= 32 threads)
> Two simulations, 13.8+13.5=27.3 ns/day ( with N= 16 threads each)
I'd still expect you have pinning issues here, but now you are forcing
a different DD, so the total performance will probably be lower than
the above.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> For Group:
> -ntomp option is not available with Group cutoff-scheme.
>
> mdrun -nt N -deffnm file where N is the no. of threads
>
> for Single simulation, 38.5ns/day (N= 32 threads)
> Two simulations, 22.1+22.1=44.2 ns/day ( with N= 16 threads each)
> Four simulations, 12+11.9+12.2+11.8=48 ns/day ( N=8 threads each).
Total throughput is higher, which is unsurprising for a system heavily
dominated by water. You're probably seen gains from having larger
domains when running with lower N, but you'd have to look at the
timing reports in the .log files to know.
> Also, using -nt gives a warning to use -pin on , pinoffset option but
> if I use this option I get reduction in the performance again! the
> commands that I followed are:
> For Group cutoff-scheme and two simultaneous simulations --
> mdrun -nt 16 -pin on -pinoffset 0 -deffnm file
> mdrun -nt 16 -pin on -pinoffset 8 -deffnm file
> the performance was 13.063+12.250=25.313 ns/day
There's some cunning remapping going on that I think means that
pinning combination is not optimal - see mdrun -h.
Mark
> It is not clear to me why the performance decreases drastically on
> running simultaneous simulations with the verlet scheme. What is the
> correct way to use'-pin' option when running simultaneous simulations
> on a single server with multiple processors?
>
> The .mdp file that i used is (for Group, 'cutoff-scheme' option was omitted):
>
> ; Run parameters
> integrator = md ; leap-frog integrator
> nsteps = 250000 ; 2 * 250000 = 500 ps, 0.5 ns
> dt = 0.002 ; 2 fs
> ; Output control
> nstxout = 100 ; save coordinates every 2 ps
> nstvout = 100 ; save velocities every 2 ps
> nstxtcout = 100 ; xtc compressed trajectory output every 2 ps
> nstenergy = 100 ; save energies every 2 ps
> nstlog = 100 ; update log file every 2 ps
> ; Bond parameters
> continuation = yes ; Restarting after NPT
> constraint_algorithm = lincs ; holonomic constraints
> constraints = all-bonds ; all bonds (even heavy atom-H bonds)
> constrained
> lincs_iter = 1 ; accuracy of LINCS
> lincs_order = 4 ; also related to accuracy
> ; Neighborsearching
> ns_type = grid ; search neighboring grid cells
> nstlist = 5 ; 10 fs
> rlist = 1.0 ; short-range neighborlist cutoff (in nm)
> rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)
> rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
> ; Electrostatics
> coulombtype = PME ; Particle Mesh Ewald for long-range
> electrostatics
> pme_order = 4 ; cubic interpolation
> fourierspacing = 0.16 ; grid spacing for FFT
> ; Temperature coupling is on
> tcoupl = V-rescale ; modified Berendsen thermostat
> tc-grps = Protein Non-Protein ; two coupling groups - more accurate
> tau_t = 0.1 0.1 ; time constant, in ps
> ref_t = 300 300 ; reference temperature, one for each
> group, in K
> ; Pressure coupling is on
> pcoupl = Parrinello-Rahman ; Pressure coupling on in NPT
> pcoupltype = isotropic ; uniform scaling of box vectors
> tau_p = 2.0 ; time constant, in ps
> ref_p = 1.0 ; reference pressure, in bar
> compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
> ; Periodic boundary conditions
> pbc = xyz ; 3-D PBC
> ; Dispersion correction
> DispCorr = EnerPres ; account for cut-off vdW scheme
> ; Velocity generation
> gen_vel = no ; Velocity generation is off
> ; CUTOFF SCHEME
> cutoff-scheme = Verlet
>
>
> Thanks for your time.
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
More information about the gromacs.org_gmx-users
mailing list