[gmx-developers] some benchs on cray xt4 and xt5

Wed Dec 10 15:27:58 CET 2008

Dear Berk,
the highlight in this tests is the macroscopic difference between xt4 and xt5 using the same options
and input. My mdp is :

rlist               =  1.0
rcoulomb            =  1.0
fourierspacing      =  0.15
pme_order           =  4
rvdw                =  1.0

Using 128 cpu on xt4 I get:

 Domain decomp.        64     450001   283541.219   123285.0     1.7
 Send X to PME         64    4500001    16471.763     7162.0     0.1
 Comm. coord.          64    4500001   143325.633    62318.6     0.9
 Neighbor search       64     450001   469227.820   204022.3     2.9
 Force                 64    4500001  4813100.223  2092757.0    29.3
 Wait + Comm. F        64    4500001   770041.675   334817.5     4.7
 PME mesh              64    4500001  6165375.373  2680732.2    37.6
 Wait + Comm. X/F      64    4500001  2041930.805   887840.5    12.4
 Wait + Recv. PME F    64    4500001   468701.376   203793.4     2.9
 Write traj.           64       4553     1180.231      513.2     0.0
 Update                64    4500001   250814.758   109055.4     1.5
 Constraints           64    4500001   539024.234   234370.1     3.3
 Comm. energies        64    4500001   317755.221   138161.4     1.9
 Rest                  64              134137.403    58323.5     0.8

I do not the statistic results on xt5 since I killed the job by the end.

The option -nosum should improve the performance on xt5 or both machine?

thanks in advance

andrea

Berk Hess wrote:
> Hi,
> 
> Have you looked at the cycle counts at the end of the log files?
> I expect that most time is consumed by the energy summation
> when using that many cpu's.
> 
> Try running with the option -nosum
> 
> Also, if you are using PME, you need relatively long cut-off and a 
> coarse PME grid
> for optimal performance, otherwise PME takes relatively too much time.
> I would use something like: cut-off = 1.2, grid_spacing=0.16
> 
> Berk
> 
> andrea spitaleri wrote:
>> Dear all,
>> I am using gromacs-4.0.2 on two systems: cray xt4 and xt5 (csc louhi). Here you are in short a table
>> with some tests:
>>
>> MD simulation 9ns on a system protein+water (ca. 200,000 atoms):
>>
>> 128 cpu 64 pme 15h 30min on hector (xt4)
>> 128 cpu 64 pme 15h 20min on louhi (xt4)
>> 128 cpu 64 pme 20h on louhi (xt5)
>>
>> 256 cpu 128 pme 12h on hector (xt4)
>> 256 cpu 128 pme 21h on louhi (xt5)
>>
>> One explanation should be (from one of the administrators):
>>
>> "One possibility for this is, that Gromacs is message intensive, and is
>> thefore slower on xt5 because of the xt5 architecture. (Basically 2
>> nodes (8 cores) share the same Hypertransport, whereas on xt4 each node
>> (4 cores) has that of its own, see eg.
>> http://www.csc.fi/english/pages/louhi_guide/hardware/computenodes/index_html )"
>>
>> what do you think about it?
>>
>> thanks in advance
>>
>> Regards
>>
>> andrea
>>
>>
>>
>>   
> 
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-developers-request at gromacs.org.

-- 
-------------------------------
Andrea Spitaleri PhD
Dulbecco Telethon Institute
c/o DIBIT Scientific Institute
Biomolecular NMR, 1B4
Via Olgettina 58
20132 Milano (Italy)
http://biomolecularnmr.ihsr.dom/
Tel: 0039-0226434348/5622/3497/4922
Fax: 0039-0226434153
-------------------------------