[gmx-users] Need some precisions about logfile...
Florian Haberl
Florian.Haberl at chemie.uni-erlangen.de
Wed Dec 7 17:09:31 CET 2005
hi,
On Wednesday 07 December 2005 12:27, Stéphane Teletchéa wrote:
> Hello,
>
> I'm doing some benchmarking for discovering gromacs (i've used amber in
> the past) and i'm focussing for now on gromacs3.3.
>
> My computer (relatively modest, for gromacs' discovery) :
>
> Pentium IV 2*2.6 Ghz SMP, 1GB of memory
> Gromacs 3.3 compiled with single/double precision with/without MPI
> (mpich 1.2.5.2-5mdk), MandakeLinux LE2005, gcc 3.4.3-7mdk.
>
> The ouput log file contains indications about the performance of gromacs
> for the calculation and i'd like some precisions (i've been through the
> archives up to 2003 already but it has not been covered as extensively
> as i wish) :
>
> At the end of the run (double precision shown here), i get :
>
> Single processor calculations :
> -----------------------------------------------------------------------
>
> NODE (s) Real (s) (%)
> Time: 146.070 150.000 97.4
> 2:26
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 4.693 1.003 5.915 4.058
> -----------------------------------------------------------------------
>
> 2 processors (SMP) :
> -----------------------------------------------------------------------
>
> NODE (s) Real (s) (%)
> Time: 166.000 166.000 100.0
> 2:46
> (Mnbf/s) (MFlops) (ns/day) (hour/ns)
> Performance: 4.122 883.316 5.205 4.611
>
> Detailed load balancing info in percentage of average
> -----------------------------------------------------------------------
have you really started it on 2cpus like mpirun -np 2 ?
The information stands in first line of md*.log file
like
Log file opened on Tue Dec 6 14:21:12 2005
Host: gorgo pid: 8475 nodeid: 3 nnodes: 4
The Gromacs distribution was built Tue Dec 6 09:18:39 CET 2005 by
bco117 at sfront03 (Linux 2.6.14.2 x86_64)
also if running on 2 cpus theres a nice load balancing info in the end of the file.
Detailed load balancing info in percentage of average
Type NODE: 0 1 Scaling
-------------------------------
LJ:162 37 61%
Coulomb:200 0 50%
Coulomb [W3]: 0 199 50%
Coulomb + LJ:200 0 50%
Coulomb + LJ [W3]: 0 199 50%
Coulomb + LJ [W3-W3]: 95 104 95%
Outer nonbonded loop: 92 107 93%
1,4 nonbonded interactions:200 0 50%
NS-Pairs: 99 100 99%
Reset In Box: 99 100 99%
Shift-X: 99 100 99%
CG-CoM:101 98 98%
Sum Forces: 99 100 99%
Bonds:200 0 50%
Angles:200 0 50%
Propers:200 0 50%
Impropers:200 0 50%
Virial: 99 100 99%
Update: 99 100 99%
Stop-CM: 99 100 99%
Calc-Ekin: 99 100 99%
Lincs:200 0 50%
Lincs-Mat:200 0 50%
Constraint-V: 99 100 99%
Constraint-Vir: 96 103 96%
Settle: 95 104 95%
Total Force: 98 101 98%
Total Shake: 96 103 96%
Total Scaling: 98% of max performance
>
> Question 1 :
> * i get a significant lower performance for 2 processors (i took the
> node0 log since no other information seems there), do i simply need to
> multiply the value for 2 processors for 2 ? e.g. :
> 1 node = 5.915 ns/day, 2 nodes = 5.205 * 2 = 10.410 ns/day ?
> (It seems the gromacs 3.3 format is different thant the one of 3.2)
>
> Question 2 :
> * The time reported could drive the user to the conclusion (and your web
> site states this) that a 2-cpu on a single precision run is faster than
> the multiplication by two of the 1-cpu run.
>
> a ) What is the rationale of this ?
>
> b ) I think there is a contradiction about this status :
>
> Here are my single precision results (villin):
> villin-3.3_1cpu_s 109.000 8.463
> villin-3.3_2cpus_s 102.000 8.471
> (obviously the 2-cpu run seems 8.471/8.463*100 = 100.1 %)
>
> The thing i don't understand is the time reported for the run : 109s for
> the mono-processor run, 102 s for the 2-cpu processors run.
> Shouldn't the second value (taken from the log file of node 0) be half
> of the 1-cpu run (109/2 = 45.5 s) ?
>
> Second, if i can find the 109 s value from :
> <hour of the run stop> - <hour of the run start> =
> <Tue Dec 6 17:18:08 2005> - <Tue Dec 6 17:16:19 2005> =
> 109 s
>
> I cannot find it for the 2-CPU job (values come from the node0 logfile) :
> <hour of the run stop> - <hour of the run start> =
> <Tue Dec 6 17:18:11 2005> - <Tue Dec 6 17:19:57 2005> =
> 106 s (instead of 102 ...)
>
> (i've taken real time not node time)
>
> c ) How can i state a 100.1 % for the 2-CPU run over the 1 cpu run where
> the log file itself states :
> -----------------------------------------------------------------------
> Total Scaling: 99% of max performance
> -----------------------------------------------------------------------
>
> I understand this when i see such a statement :
> take the 1-cpu performance, multiply it by 99% and by number of cpu and
> you'll get the total performance (8.463*0.99*2 = 16.76 ns/day instead of
> 8.471*2 = 16.94 ps/day).
>
> Am i right ?
>
> Sorry to bother you about theses numbers but i want precisely to
> understand what is stated in the log file, how it is done internally (is
> time for multiprocessor the *cumulated* time ?).
>
> My goal is to release a generic script for benchmarking (i've already
> done it, the missing part is a nice graph showing performance over the
> bench in xmgrace's format).
>
> I've prepared the files for SMP for the moment but it will be used for a
> cluster benchmark (i'll post the results) and i want to be sure i'm
> doing it correctly.
>
> Sincerely Yours,
>
> S. Téletchéa
greetings,
florian
--
-------------------------------------------------------------------------------
Florian Haberl Universitaet Erlangen/ Nuernberg
Computer-Chemie-Centrum
Naegelsbachstr. 25
D-91052 Erlangen
Mailto: florian.haberl AT chemie.uni-erlangen.de
-------------------------------------------------------------------------------
More information about the gromacs.org_gmx-users
mailing list