[gmx-users] gromacs 3.3.3 vs 4.0.3 performance
Dimitris Dellis
ntelll at gmail.com
Fri Jan 30 18:34:41 CET 2009
Justin A. Lemkul wrote:
>
>
> Dimitris Dellis wrote:
>> Hi.
>> I run the same (exactly) simulations with v3.3.3 and v4.0.3, on the
>> same 64bit Q6600/DDR2-1066 machine, gcc-4.3.2 ,fftw-3.2.
>> I found that the performance of 4.0.3 is roughly 30% lower than 3.3.3
>> (30% higher hours/ns), for few systems (512 molecules of 5-15 sites,
>> nstlist=10) I tried.
>> This happens with single precision serial and parallel, np=2,4
>> (openmpi 1.3) versions and only when electrostatics (PME) are present.
>> With Simple LJ potentials the performance is exactly the same.
>> Is there any speed comparison 3.3.3 vs 4.0.3 available ?
>> D.D.
>>
>
> Can you show us your .mdp file? What did grompp report about the
> relative PME load? These topics have been discussed a few times;
> you'll find lots of pointers on optimizing performance in the list
> archive.
>
> -Justin
>
Hi Justin,
These are from the small system, no I/O only 1k steps.
grompp.mdp
===========
integrator = md
dt = 0.0010
nsteps = 1000
nstxout = 0
nstvout = 0
nstlog = 1000
nstcomm = 10
nstenergy = 0
nstxtcout = 0
nstlist = 10
ns_type = grid
dispcorr = AllEnerPres
tcoupl = berendsen
tc-grps = System
ref_t = 293.15
gen_temp = 293.15
tau_t = 0.2
gen_vel = no
gen_seed = 123456
constraints = none
constraint_algorithm = shake
dispcorr = AllEnerPres
energygrps = System
rlist = 1.6
vdw-type = Cut-off
rvdw = 1.6
coulombtype = PME
fourierspacing = 0.12
pme_order = 4
ewald_rtol = 1.0e-5
optimize_fft = yes
rcoulomb = 1.6
related 4.0.3 grompp output
Estimate for the relative computational load of the PME mesh part: 0.19
4.0.3 mdrun serial timings (near zero omitted)
Coul(T) + LJ 576.513824 31708.260 71.5
Outer nonbonded loop 8.489390 84.894 0.2
Calc Weights 6.006000 216.216 0.5
Spread Q Bspline 128.128000 256.256 0.6
Gather F Bspline 128.128000 1537.536 3.5
3D-FFT 1088.769682 8710.157 19.6
Solve PME 18.531513 1186.017 2.7
parallel 4.0.3 np=4
Average load imbalance: 5.2 %
Part of the total run time spent waiting due to load imbalance: 2.3 %
Performance: 96.086 7.380 14.414 1.665
3.3.3 mdrun serial timings
Coul(T) + LJ 576.529632 31709.129760 72.0
Outer nonbonded loop 8.487860 84.878600 0.2
Spread Q Bspline 128.128000 256.256000 0.6
Gather F Bspline 128.128000 1537.536000 3.5
3D-FFT 1088.769682 8710.157456 19.8
Solve PME 17.986469 1151.133984 2.6
parallel 3.3.3 np=4
Performance: 144.132 12.556 21.600 1.111
D.D.
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> gmx-users mailing list gmx-users at gromacs.org
>> http://www.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at http://www.gromacs.org/search before
>> posting!
>> Please don't post (un)subscribe requests to the list. Use the www
>> interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090130/f68b605b/attachment.html>
More information about the gromacs.org_gmx-users
mailing list