[gmx-developers] some benchs on cray xt4 and xt5
hessb at mpip-mainz.mpg.de
Wed Dec 10 15:33:49 CET 2008
Comm. energies is here only 1.9 %,
that is so little that -nosum would not have much effect.
But your relative PME load is far too high, you spend more time
in Force than in PME mesh.
You should increase the cut-off's and fourierspacing by the same factor,
probably between 1.1 and 1.2.
I don't know what grompp and mdrun estimated for the PME load
(visible in the output and log file), but you ideally PME mesh
should be about half or a third of Force.
Now the PME time is probably quite high because you have 64 PME
nodes, which causes a lot of communication. This will automatically
go down with a coarser grid.
But you might need to tune -npme by hand.
andrea spitaleri wrote:
> Dear Berk,
> the highlight in this tests is the macroscopic difference between xt4 and xt5 using the same options
> and input. My mdp is :
> rlist = 1.0
> rcoulomb = 1.0
> fourierspacing = 0.15
> pme_order = 4
> rvdw = 1.0
> Using 128 cpu on xt4 I get:
> Domain decomp. 64 450001 283541.219 123285.0 1.7
> Send X to PME 64 4500001 16471.763 7162.0 0.1
> Comm. coord. 64 4500001 143325.633 62318.6 0.9
> Neighbor search 64 450001 469227.820 204022.3 2.9
> Force 64 4500001 4813100.223 2092757.0 29.3
> Wait + Comm. F 64 4500001 770041.675 334817.5 4.7
> PME mesh 64 4500001 6165375.373 2680732.2 37.6
> Wait + Comm. X/F 64 4500001 2041930.805 887840.5 12.4
> Wait + Recv. PME F 64 4500001 468701.376 203793.4 2.9
> Write traj. 64 4553 1180.231 513.2 0.0
> Update 64 4500001 250814.758 109055.4 1.5
> Constraints 64 4500001 539024.234 234370.1 3.3
> Comm. energies 64 4500001 317755.221 138161.4 1.9
> Rest 64 134137.403 58323.5 0.8
> I do not the statistic results on xt5 since I killed the job by the end.
> The option -nosum should improve the performance on xt5 or both machine?
> thanks in advance
> Berk Hess wrote:
>> Have you looked at the cycle counts at the end of the log files?
>> I expect that most time is consumed by the energy summation
>> when using that many cpu's.
>> Try running with the option -nosum
>> Also, if you are using PME, you need relatively long cut-off and a
>> coarse PME grid
>> for optimal performance, otherwise PME takes relatively too much time.
>> I would use something like: cut-off = 1.2, grid_spacing=0.16
>> andrea spitaleri wrote:
>>> Dear all,
>>> I am using gromacs-4.0.2 on two systems: cray xt4 and xt5 (csc louhi). Here you are in short a table
>>> with some tests:
>>> MD simulation 9ns on a system protein+water (ca. 200,000 atoms):
>>> 128 cpu 64 pme 15h 30min on hector (xt4)
>>> 128 cpu 64 pme 15h 20min on louhi (xt4)
>>> 128 cpu 64 pme 20h on louhi (xt5)
>>> 256 cpu 128 pme 12h on hector (xt4)
>>> 256 cpu 128 pme 21h on louhi (xt5)
>>> One explanation should be (from one of the administrators):
>>> "One possibility for this is, that Gromacs is message intensive, and is
>>> thefore slower on xt5 because of the xt5 architecture. (Basically 2
>>> nodes (8 cores) share the same Hypertransport, whereas on xt4 each node
>>> (4 cores) has that of its own, see eg.
>>> http://www.csc.fi/english/pages/louhi_guide/hardware/computenodes/index_html )"
>>> what do you think about it?
>>> thanks in advance
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
More information about the gromacs.org_gmx-developers