[gmx-developers] some benchs on cray xt4 and xt5

Wed Dec 10 15:33:49 CET 2008

Hi,

Comm. energies is here only 1.9 %,
that is so little that -nosum would not have much effect.

But your relative PME load is far too high, you spend more time
in Force than in PME mesh.
You should increase the cut-off's and fourierspacing by the same factor,
probably between 1.1 and 1.2.
I don't know what grompp and mdrun estimated for the PME load
(visible in the output and log file), but you ideally PME mesh
should be about half or a third of Force.
Now the PME time is probably quite high because you have 64 PME
nodes, which causes a lot of communication. This will automatically
go down with a coarser grid.
But you might need to tune -npme by hand.

Berk

andrea spitaleri wrote:
> Dear Berk,
> the highlight in this tests is the macroscopic difference between xt4 and xt5 using the same options
> and input. My mdp is :
>
> rlist               =  1.0
> rcoulomb            =  1.0
> fourierspacing      =  0.15
> pme_order           =  4
> rvdw                =  1.0
>
>
> Using 128 cpu on xt4 I get:
>
>  Domain decomp.        64     450001   283541.219   123285.0     1.7
>  Send X to PME         64    4500001    16471.763     7162.0     0.1
>  Comm. coord.          64    4500001   143325.633    62318.6     0.9
>  Neighbor search       64     450001   469227.820   204022.3     2.9
>  Force                 64    4500001  4813100.223  2092757.0    29.3
>  Wait + Comm. F        64    4500001   770041.675   334817.5     4.7
>  PME mesh              64    4500001  6165375.373  2680732.2    37.6
>  Wait + Comm. X/F      64    4500001  2041930.805   887840.5    12.4
>  Wait + Recv. PME F    64    4500001   468701.376   203793.4     2.9
>  Write traj.           64       4553     1180.231      513.2     0.0
>  Update                64    4500001   250814.758   109055.4     1.5
>  Constraints           64    4500001   539024.234   234370.1     3.3
>  Comm. energies        64    4500001   317755.221   138161.4     1.9
>  Rest                  64              134137.403    58323.5     0.8
>
>
> I do not the statistic results on xt5 since I killed the job by the end.
>
> The option -nosum should improve the performance on xt5 or both machine?
>
> thanks in advance
>
> andrea
>
>
> Berk Hess wrote:
>   
>> Hi,
>>
>> Have you looked at the cycle counts at the end of the log files?
>> I expect that most time is consumed by the energy summation
>> when using that many cpu's.
>>
>> Try running with the option -nosum
>>
>> Also, if you are using PME, you need relatively long cut-off and a 
>> coarse PME grid
>> for optimal performance, otherwise PME takes relatively too much time.
>> I would use something like: cut-off = 1.2, grid_spacing=0.16
>>
>> Berk
>>
>> andrea spitaleri wrote:
>>     
>>> Dear all,
>>> I am using gromacs-4.0.2 on two systems: cray xt4 and xt5 (csc louhi). Here you are in short a table
>>> with some tests:
>>>
>>> MD simulation 9ns on a system protein+water (ca. 200,000 atoms):
>>>
>>> 128 cpu 64 pme 15h 30min on hector (xt4)
>>> 128 cpu 64 pme 15h 20min on louhi (xt4)
>>> 128 cpu 64 pme 20h on louhi (xt5)
>>>
>>> 256 cpu 128 pme 12h on hector (xt4)
>>> 256 cpu 128 pme 21h on louhi (xt5)
>>>
>>> One explanation should be (from one of the administrators):
>>>
>>> "One possibility for this is, that Gromacs is message intensive, and is
>>> thefore slower on xt5 because of the xt5 architecture. (Basically 2
>>> nodes (8 cores) share the same Hypertransport, whereas on xt4 each node
>>> (4 cores) has that of its own, see eg.
>>> http://www.csc.fi/english/pages/louhi_guide/hardware/computenodes/index_html )"
>>>
>>> what do you think about it?
>>>
>>> thanks in advance
>>>
>>> Regards
>>>
>>> andrea
>>>
>>>
>>>
>>>   
>>>       
>> _______________________________________________
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://www.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the 
>> www interface or send it to gmx-developers-request at gromacs.org.
>>     
>
>