[gmx-users] [Gromacs 3.3.3] tests for parallel - is this reasonable?
Mark Abraham
Mark.Abraham at anu.edu.au
Mon Oct 5 13:42:37 CEST 2009
Thomas Schlesier wrote:
> Hi all,
Why use 3.3.3? For most purposes the most recent version is more
reliable and much faster.
> i have done some small tests for parallel calculations with different
> systems:
> All simulations were done on my laptop which has a dualcore CPU.
>
> (1) 895 waters - 2685 atoms, for 50000 steps (100ps)
> cutoffs 1.0nm (no pme); 3nm cubic box
Nobody uses cutoffs any more. Test with the method you'll use in real
calculation - PME.
> single:
> NODE (s) Real (s) (%)
> Time: 221.760 222.000 99.9
> 3:41
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 13.913 3.648 38.961 0.616
> parallel (2 cores):
> NODE (s) Real (s) (%)
> Time: 160.000 160.000 100.0
> 2:40
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 19.283 5.056 54.000 0.444
> Total Scaling: 98% of max performance
>
> => 1.386 times faster
>
> (2) 3009 waters - 9027 atoms, for 50000 steps (100ps)
> cutoffs 1.0nm (no pme); 4.5nm cubic box
> single:
> NODE (s) Real (s) (%)
> Time: 747.830 751.000 99.6
> 12:27
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 13.819 3.617 11.553 2.077
> parallel (2cores):
> NODE (s) Real (s) (%)
> Time: 525.000 525.000 100.0
> 8:45
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 19.684 5.154 16.457 1.458
> Total Scaling: 98% of max performance
>
> => 1.424 times faster
>
> (3) 2 waters
> rest same as (1)
> single:
> NODE (s) Real (s) (%)
> Time: 0.680 1.000 68.0
> (Mnbf/s) (MFlops) (ns/day) (hour/ns)
> Performance: 0.012 167.973 12705.884 0.002
> parallel:
> NODE (s) Real (s) (%)
> Time: 9.000 9.000 100.0
> (Mnbf/s) (MFlops) (ns/day) (hour/ns)
> Performance: 0.003 17.870 960.000 0.025
> Total Scaling: 88% of max performance
>
> => about 10 times slower
> (this one was more a test to see how the values look for a case where
> parallelisation is a waste)
>
> So now my questions:
> 1) Are the values reasonable (i mean not really each value, but more the
> speed difference between parallel and single)? I would have assumed that
> if the system is big (2) i'm with two cores about a factor of a little
> bit less then 2 faster, and not only around 1.4 times
It depends on a whole pile of factors. Are your cores real or only
hyperthreads? Do they share caches? I/O systems? Can MPI use the cache
for communication or does it have to write through to main memory? How
big are the caches? People's laptops that were designed for websurfing
and editing Word documents often skimp on stuff that is necessary if
you're actually planning to keep your floating point units saturated
with work... You may like to run two copies of the same single-processor
job to get a handle on these issues.
> 2) In the md0.log files (for parallel runs) i have seen for all three
> simulations the following line:
> "Load imbalance reduced performance to 200% of max"
> What does it mean? And why is it in all three cases the same?
Dunno, probably buggy.
> 3) What does the "Total Scaling" mean? In case (3) i'm with single 10
> times better, but for parallel it says i have 88% of max performance (If
> i set single to 100%, it would only be 10% performance).
The spatial decomposition will not always lead to even load balance.
That's life. The domain decomposition in 4.x will do a much better job,
though it's probably not going to matter much for 2 cores on small
systems and short runs.
Mark
More information about the gromacs.org_gmx-users
mailing list