[gmx-users] gpu cluster explanation
Francesco
fracarb at myopera.com
Tue Jul 23 12:02:58 CEST 2013
Hi Richard,
Thank you for the help and sorry for the delay in my reply.
I tried some test run changing some parameters (e.g. removing PME) and I
was able to reach 20ns/day, so I think that 9-11 ns/day it's the max
that I can obtain for my setting.
thank your again for your help.
cheers,
Fra
On Fri, 12 Jul 2013, at 03:41 PM, Richard Broadbent wrote:
>
>
> On 12/07/13 13:26, Francesco wrote:
> > Hi all,
> > I'm working with a 200K atoms system (protein + explicit water) and
> > after a while using a cpu cluster I had to switch to a gpu cluster.
> > I read both Acceleration and parallelization and Gromacs-gpu
> > documentation pages
> > (http://www.gromacs.org/Documentation/Acceleration_and_parallelization
> > and
> > http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM)
> > but it's a bit confusing and I need help to understand if I really have
> > understood correctly. :)
> > I have 2 type of nodes:
> > 3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @
> > 2.53GHz)
> > 8gpu and 2 cpu (6 cores each)
> >
> > 1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3
> > MPI max.
> > 2) because I have 12 cores I can open 4 OPenMP threads x MPI, because
> > 4x3= 12
> >
> > now if I have a node with 8 gpu, I can use 4 gpu:
> > 4 MPI and 3 OpenMP
> > is it right?
> > is it possible to use 8 gpu and 8 cores only?
>
> you could set -ntomp 0, however and setup mpi/thread mpi to use 8 cores.
> However, a system that unbalanced (huge amount of gpu power to
> comparatively little cpu power) is unlikely to get great performance.
> >
> > Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3
> > gpu and 12 cores I get 9-11 ns/day.
> >
> That slowdown is in line with what I got when I tried a similar cpu-gpu
> setup. That said other's might have some advice that will improve your
> performance.
>
> > the command that I use is:
> > mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v
> > with n° gpu set via script :
> > #BSUB -n 3
> >
> > I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes.
> >
> > The mdp file and some statistics are following:
> >
> > -------- START MDP --------
> >
> > title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD
> >
> > ; Run parameters
> > integrator = md ; Algorithm options
> > nsteps = 25000000 ; maximum number of steps to
> > perform [50 ns]
> > dt = 0.002 ; 2 fs = 0.002 ps
> >
> > ; Output control
> > nstxout = 10000 ; [steps] freq to write coordinates to
> > trajectory, the last coordinates are always written
> > nstvout = 10000 ; [steps] freq to write velocities to
> > trajectory, the last velocities are always written
> > nstlog = 10000 ; [steps] freq to write energies to log
> > file, the last energies are always written
> > nstenergy = 10000 ; [steps] write energies to disk
> > every nstenergy steps
> > nstxtcout = 10000 ; [steps] freq to write coordinates to
> > xtc trajectory
> > xtc_precision = 1000 ; precision to write to xtc trajectory
> > (1000 = default)
> > xtc_grps = system ; which coordinate
> > group(s) to write to disk
> > energygrps = system ; or System / which energy
> > group(s) to writk
> >
> > ; Bond parameters
> > continuation = yes ; restarting from npt
> > constraints = all-bonds ; Bond types to replace by constraints
> > constraint_algorithm = lincs ; holonomic constraints
> > lincs_iter = 1 ; accuracy of LINCS
> > lincs_order = 4 ; also related to
> > accuracy
> > lincs_warnangle = 30 ; [degrees] maximum angle that a bond can
> > rotate before LINCS will complain
> >
>
> That seems a little loose for constraints but setting that up and
> checking it's conserving energy and preserving bond lengths is something
> you'll have to do yourself
>
> Richard
> > ; Neighborsearching
> > ns_type = grid ; method of updating neighbor list
> > cutoff-scheme = Verlet
> > nstlist = 10 ; [steps] frequence to update
> > neighbor list (10)
> > rlist = 1.0 ; [nm] cut-off distance for the
> > short-range neighbor list (1 default)
> > rcoulomb = 1.0 ; [nm] long range electrostatic cut-off
> > rvdw = 1.0 ; [nm] long range Van der Waals cut-off
> >
> > ; Electrostatics
> > coulombtype = PME ; treatment of long range electrostatic
> > interactions
> > vdwtype = cut-off ; treatment of Van der Waals
> > interactions
> >
> > ; Periodic boundary conditions
> > pbc = xyz
> >
> > ; Dispersion correction
> > DispCorr = EnerPres ; appling long
> > range dispersion corrections
> >
> > ; Ewald
> > fourierspacing = 0.12 ; grid spacing for FFT -
> > controll the higest magnitude of wave vectors (0.12)
> > pme_order = 4 ; interpolation order for PME, 4 = cubic
> > ewald_rtol = 1e-5 ; relative strength of Ewald-shifted
> > potential at rcoulomb
> >
> > ; Temperature coupling
> > tcoupl = nose-hoover ; temperature
> > coupling with Nose-Hoover ensemble
> > tc_grps = Protein Non-Protein
> > tau_t = 0.4 0.4 ; [ps]
> > time constant
> > ref_t = 310 310 ; [K]
> > reference temperature for coupling [310 = 28°C
> >
> > ; Pressure coupling
> > pcoupl = parrinello-rahman
> > pcoupltype = isotropic ;
> > uniform scaling of box vect
> > tau_p = 2.0
> > ; [ps] time constant
> > ref_p = 1.0
> > ; [bar] reference pressure for coupling
> > compressibility = 4.5e-5
> > ; [bar^-1] isothermal compressibility of water
> > refcoord_scaling = com
> > ; have a look at GROMACS documentation 7.
> >
> > ; Velocity generation
> > gen_vel = no ; generate velocities in grompp
> > according to a Maxwell distribution
> >
> > -------- END MDP --------
> >
> > -------- START STATISTICS --------
> >
> > P P - P M E L O A D B A L A N C I N G
> >
> > PP/PME load balancing changed the cut-off and PME settings:
> > particle-particle PME
> > rcoulomb rlist grid spacing 1/beta
> > initial 1.000 nm 1.155 nm 100 128 96 0.120 nm 0.320 nm
> > final 1.201 nm 1.356 nm 96 100 80 0.144 nm 0.385 nm
> > cost-ratio 1.62 0.62
> > (note that these numbers concern only part of the total PP and PME
> > load)
> >
> > M E G A - F L O P S A C C O U N T I N G
> >
> > NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> > RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> > W3=SPC/TIP3p W4=TIP4p (single or pairs)
> > V&F=Potential and force V=Potential only F=Force only
> >
> > D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
> >
> > av. #atoms communicated per step for force: 2 x 54749.0
> > av. #atoms communicated per step for LINCS: 2 x 5418.4
> >
> > Average load imbalance: 12.8 %
> > Part of the total run time spent waiting due to load imbalance: 1.4 %
> > Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
> > Y 0 %
> >
> >
> > R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >
> > Computing: Nodes Th. Count Wall t (s) G-Cycles
> > %
> > -----------------------------------------------------------------------------
> > Domain decomp. 3 4 625000 10388.307 315806.805
> > 2.3
> > DD comm. load 3 4 625000 129.908 3949.232
> > 0.0
> > DD comm. bounds 3 4 625001 267.204 8123.069
> > 0.1
> > Neighbor search 3 4 625001 7756.651 235803.900
> > 1.7
> > Launch GPU ops. 3 4 50000002 3376.764 102654.354
> > 0.8
> > Comm. coord. 3 4 24375000 10651.213 323799.209
> > 2.4
> > Force 3 4 25000001 35732.146 1086265.102
> > 8.0
> > Wait + Comm. F 3 4 25000001 19866.403 603943.033
> > 4.5
> > PME mesh 3 4 25000001 235964.754 7173380.387
> > 53.0
> > Wait GPU nonlocal 3 4 25000001 12055.970 366504.140
> > 2.7
> > Wait GPU local 3 4 25000001 106.179 3227.866
> > 0.0
> > NB X/F buffer ops. 3 4 98750002 10256.750 311807.459
> > 2.3
> > Write traj. 3 4 2994 249.770 7593.073
> > 0.1
> > Update 3 4 25000001 33108.852 1006516.379
> > 7.4
> > Constraints 3 4 25000001 51671.482 1570824.423
> > 11.6
> > Comm. energies 3 4 2500001 463.135 14079.404
> > 0.1
> > Rest 3 13290.037 404020.040
> > 3.0
> > -----------------------------------------------------------------------------
> > Total 3 445335.526 13538297.876
> > 100.0
> > -----------------------------------------------------------------------------
> > -----------------------------------------------------------------------------
> > PME redist. X/F 3 4 50000002 40747.165 1238722.760
> > 9.1
> > PME spread/gather 3 4 50000002 122026.128 3709621.109
> > 27.4
> > PME 3D-FFT 3 4 50000002 46613.023 1417046.140
> > 10.5
> > PME 3D-FFT Comm. 3 4 50000002 20934.134 636402.285
> > 4.7
> > PME solve 3 4 25000001 5465.690 166158.163
> > 1.2
> > -----------------------------------------------------------------------------
> >
> > Core t (s) Wall t (s) (%)
> > Time: 5317976.200 445335.526 1194.2
> > 5d03h42:15
> > (ns/day) (hour/ns)
> > Performance: 9.701 2.474
> >
> > -------- END STATISTICS --------
> >
> > thank you very much for the help.
> > cheers,
> > Fra
> >
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
Francesco Carbone
PhD student
Institute of Structural and Molecular Biology
UCL, London
fra.carbone.12 at ucl.ac.uk
More information about the gromacs.org_gmx-users
mailing list