[gmx-users] Possible bug in parallelization, PME or load-balancing on Gromacs 4.0_rc1 ??

Bjørn Steen Sæthre st01397 at student.uib.no
Wed Oct 1 14:41:19 CEST 2008


Hi again Berk,
I know that this particular run used no more than 1:40 hours (( I was
following it), but I am not able to cough up the complete log as it was
accidentally overwritten by a new run.


I do however have the same phenomenon in a shorter annealing trial. I
enclose the entire log in this mail, and show excerpts below.

My startup script for this run looked like this:
------------------------------
#!/bin/bash
#PBS -A fysisk
#PBS -N pmf_hydanneal_anneal2
#PBS -o pmf_hydanneal.o
#PBS -e pmf.hydanneal.err
#PBS -l walltime=1:00:00,mppwidth=50,mppnppn=4
cd /work/bjornss/pmf/structII/hydrate_annealing/anneal2
source $HOME/gmx_latest_250908/bin/GMXRC

aprun -n 50 parmdrun -s topol.tpr -maxh 1 -npme 18
exit $?
--------------------------

Now this should stop after 0.99hours = 59:24

But as you can see:


----------------------------------------------
head md.log
Log file opened on Mon Sep 29 20:11:42 2008
Host: nid00039  pid: 16507  nodeid: 0  nnodes:  50
The Gromacs distribution was built Mon Sep 29 13:25:26 CEST 2008 by
bjornss at nid00163 (Linux 2.6.16.54-0.2.5-ss x86_64)



                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                           :-)  VERSION 4.0_rc1  (-:

---------------------------------------------
tail md.log -n 300 (excerpt)

Step 518975: Run time exceeded 0.990 hours, will terminate the run

............................
,,,
        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:   1426.000   1426.000    100.0
                       23:46
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    100.149     29.098    242.356      0.099
Finished mdrun on node 0 Mon Sep 29 20:35:28 2008
--------------------------

That is. I got about 40% of the allotted walltime also here.
Peculiarly 1:35 / 4:00 (hexagesimally) ~ 41%. That is the relation
betweem scheduled walltime, and actually obtained time is about the same
in both cases.

Regards
Bjørn


On Wed, 2008-10-01 at 13:25 +0200, Berk Hess wrote:
> Hi,
> 
> The Cray XT4 has a torus network, but you don't get access to it as a
> torus.
> You will get assigned processors which can be anywhere in the machine
> and they are usually never in a nice cube, but there are always some
> missing.
> Therefore software, such as Gromacs, can not make use of proper
> Cartesian
> 
> (torus) communication as one can for instance on a Blue Gene.
> 
> I have no clue about the wallclock issue.
> Can you find out if the run took 1.35 or 4 hours?
> The start time is somewhere at the beginning of the log file.
> 
> Berk
> 
> 
> ______________________________________________________________________

-------------- next part --------------
A non-text attachment was scrubbed...
Name: md.log
Type: text/x-log
Size: 23428 bytes
Desc: not available
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20081001/f57b0993/attachment.bin>


More information about the gromacs.org_gmx-users mailing list