Thomas Schlesier
Mon Nov 12 17:18:01 CET 2012

Dear all,
i did some scaling tests for a cluster and i'm a little bit clueless 
about the results.
So first the setup:

Saxonid 6100, Opteron 6272 16C 2.100GHz, Infiniband QDR
GROMACS version: 4.0.7 and 4.5.5
Compiler: 	GCC 4.7.0
MPI: Intel MPI
FFT-library: ACML 5.1.0 fma4

895 spce water molecules
Simulation time: 750 ps (0.002 fs timestep)
Cut-off: 1.0 nm
but with long-range correction ( DispCorr = EnerPres ; PME (standard 
settings) - but in each case no extra CPU solely for PME)
V-rescale thermostat and Parrinello-Rahman barostat

I get the following timings (seconds), whereas is calculated as the time 
which would be needed for 1 CPU (so if a job on 2 CPUs took X s the time 
would be 2 * X s).
These timings were taken from the *.log file, at the end of the
'real cycle and time accounting' - section.

gmx-version	1cpu	2cpu	4cpu
4.0.7		4223	3384	3540
4.5.5		3780	3255	2878

I'm a little bit clueless about the results. I always thought, that if i 
have a non-interacting system and double the amount of CPUs, i would get 
a simulation which takes only half the time (so the times as defined 
above would be equal). If the system does have interactions, i would 
lose some performance due to communication. Due to node imbalance there 
could be a further loss of performance.

Keeping this in mind, i can only explain the timings for version 4.0.7 
2cpu -> 4cpu (2cpu a little bit faster, since going to 4cpu leads to 
more communication -> loss of performance).

All the other timings, especially that 1cpu takes in each case longer 
than the other cases, i do not understand.
Probalby the system is too small and / or the simulation time is too 
short for a scaling test. But i would assume that the amount of time to 
setup the simulation would be equal for all three cases of one 
Only other explaination, which comes to my mind, would be that something 
went wrong during the installation of the programs...

Please, can somebody enlighten me?


