[gmx-users] Possible bug in parallelization, PME or load-balancing on Gromacs 4.0_rc1 ??

Berk Hess gmx3 at hotmail.com
Wed Oct 1 16:37:56 CEST 2008


Hi,

Weird.
The only thing I can think of is that the time_t data type is somehow
incompatible with other types.
I will mail you a modified source file so you can try if the fixes the problem.

Berk

> Subject: RE: [gmx-users] Possible bug in parallelization, PME or	load-balancing on Gromacs 4.0_rc1 ??
> From: st01397 at student.uib.no
> To: gmx-users at gromacs.org
> Date: Wed, 1 Oct 2008 14:41:19 +0200
> CC: gmx3 at hotmail.com
> 
> Hi again Berk,
> I know that this particular run used no more than 1:40 hours (( I was
> following it), but I am not able to cough up the complete log as it was
> accidentally overwritten by a new run.
> 
> 
> I do however have the same phenomenon in a shorter annealing trial. I
> enclose the entire log in this mail, and show excerpts below.
> 
> My startup script for this run looked like this:
> ------------------------------
> #!/bin/bash
> #PBS -A fysisk
> #PBS -N pmf_hydanneal_anneal2
> #PBS -o pmf_hydanneal.o
> #PBS -e pmf.hydanneal.err
> #PBS -l walltime=1:00:00,mppwidth=50,mppnppn=4
> cd /work/bjornss/pmf/structII/hydrate_annealing/anneal2
> source $HOME/gmx_latest_250908/bin/GMXRC
> 
> aprun -n 50 parmdrun -s topol.tpr -maxh 1 -npme 18
> exit $?
> --------------------------
> 
> Now this should stop after 0.99hours = 59:24
> 
> But as you can see:
> 
> 
> ----------------------------------------------
> head md.log
> Log file opened on Mon Sep 29 20:11:42 2008
> Host: nid00039  pid: 16507  nodeid: 0  nnodes:  50
> The Gromacs distribution was built Mon Sep 29 13:25:26 CEST 2008 by
> bjornss at nid00163 (Linux 2.6.16.54-0.2.5-ss x86_64)
> 
> 
> 
>                          :-)  G  R  O  M  A  C  S  (-:
> 
>                    Groningen Machine for Chemical Simulation
> 
>                            :-)  VERSION 4.0_rc1  (-:
> 
> ---------------------------------------------
> tail md.log -n 300 (excerpt)
> 
> Step 518975: Run time exceeded 0.990 hours, will terminate the run
> 
> ............................
> ,,,
>         Parallel run - timing based on wallclock.
> 
>                NODE (s)   Real (s)      (%)
>        Time:   1426.000   1426.000    100.0
>                        23:46
>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
> Performance:    100.149     29.098    242.356      0.099
> Finished mdrun on node 0 Mon Sep 29 20:35:28 2008
> --------------------------
> 
> That is. I got about 40% of the allotted walltime also here.
> Peculiarly 1:35 / 4:00 (hexagesimally) ~ 41%. That is the relation
> betweem scheduled walltime, and actually obtained time is about the same
> in both cases.
> 
> Regards
> Bjørn
> 
> 
> On Wed, 2008-10-01 at 13:25 +0200, Berk Hess wrote:
> > Hi,
> > 
> > The Cray XT4 has a torus network, but you don't get access to it as a
> > torus.
> > You will get assigned processors which can be anywhere in the machine
> > and they are usually never in a nice cube, but there are always some
> > missing.
> > Therefore software, such as Gromacs, can not make use of proper
> > Cartesian
> > 
> > (torus) communication as one can for instance on a Blue Gene.
> > 
> > I have no clue about the wallclock issue.
> > Can you find out if the run took 1.35 or 4 hours?
> > The start time is somewhere at the beginning of the log file.
> > 
> > Berk
> > 
> > 
> > ______________________________________________________________________
> 

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20081001/4485f876/attachment.html>


More information about the gromacs.org_gmx-users mailing list