[gmx-users] Re: Gromacs 4.5.4 on multi-node cluster

Nikos Papadimitriou nikpap at ipta.demokritos.gr
Thu Dec 8 10:44:36 CET 2011



> 
> email message attachment
> 
> > -------- Forwarded Message --------
> > From: Nikos Papadimitriou <nikpap at ipta.demokritos.gr>
> > To: gmx-users at gromacs.org
> > Subject: [gmx-users] Gromacs 4.5.4 on multi-node cluster
> > Date: Wed, 7 Dec 2011 16:26:46 +0200
> > 
> > Dear All,
> > 
> > I had been running Gromacs 4.0.7 on a 12-node cluster (Intel i7-920
> > 4-cores) with OS Rocks 5.4.2. Recently, I have upgraded the cluster
> > OS to Rocks 5.4.3 and I have installed Gromacs 4.5.4 from the Bio
> > Roll repository. When running in parallel on the same node,
> > everything works fine. However, when I am trying to run on more than
> > one nodes the run stalls immediately with the following message:
> > 
> > [gromacs at tornado Test]$ /home/gromacs/.Installed/openmpi/bin/mpirun
> > -np 2 -machinefile
> > machines /home/gromacs/.Installed/gromacs/bin/mdrun_mpi -s
> > md_run.tpr -o md_traj.trr -c md_confs.gro -e md.edr -g md.log -v
> > NNODES=2, MYRANK=0, HOSTNAME=compute-1-1.local
> > NNODES=2, MYRANK=1, HOSTNAME=compute-1-2.local
> > NODEID=0 argc=12
> > NODEID=1 argc=12
> > 
> > The mdrun_mpi thread seems to start in both nodes but the run does
> > not go on and no file is produced. It seems that the nodes are
> > waiting for some kind of communication between them. The problem
> > occurs even for the simplest case (i.e. NVT simulation of 1000 Argon
> > atoms without Coulombic interactions). Openmpi and networking
> > between the nodes seem to work fine since there are not any problems
> > with other software that run with MPI.
> > 
> > In an attempt to find a solution, I have manually compiled and
> > installed Gromacs 4.5.5 (with --enable-mpi) after having installed
> > the latest version of openmpi and fftw3 and no error occurred during
> > the installation. However, when trying to run on two different nodes
> > exactly the same problem appears.
> > 
> > Have you any idea what might cause this situation?
> > Thank you in advance!
> 
> email message attachment
> 
> > -------- Forwarded Message --------
> > From: Mark Abraham <Mark.Abraham at anu.edu.au>
> > Reply-to: "Discussion list for GROMACS users"
> > <gmx-users at gromacs.org>
> > To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> > Subject: [gmx-users] Gromacs 4.5.4 on multi-node cluster
> > Date: Wed, 7 Dec 2011 16:53:49 +0200
> > 
> > On 8/12/2011 1:26 AM, Nikos Papadimitriou wrote: 
> > 
> > > Dear All,
> > > 
> > > I had been running Gromacs 4.0.7 on a 12-node cluster (Intel
> > > i7-920 4-cores) with OS Rocks 5.4.2. Recently, I have upgraded the
> > > cluster OS to Rocks 5.4.3 and I have installed Gromacs 4.5.4 from
> > > the Bio Roll repository. When running in parallel on the same
> > > node, everything works fine. However, when I am trying to run on
> > > more than one nodes the run stalls immediately with the following
> > > message:
> > > 
> > > [gromacs at tornado
> > > Test]$ /home/gromacs/.Installed/openmpi/bin/mpirun -np 2
> > > -machinefile
> > > machines /home/gromacs/.Installed/gromacs/bin/mdrun_mpi -s
> > > md_run.tpr -o md_traj.trr -c md_confs.gro -e md.edr -g md.log -v
> > > NNODES=2, MYRANK=0, HOSTNAME=compute-1-1.local
> > > NNODES=2, MYRANK=1, HOSTNAME=compute-1-2.local
> > > NODEID=0 argc=12
> > > NODEID=1 argc=12
> > > 
> > > The mdrun_mpi thread seems to start in both nodes but the run does
> > > not go on and no file is produced. It seems that the nodes are
> > > waiting for some kind of communication between them. The problem
> > > occurs even for the simplest case (i.e. NVT simulation of 1000
> > > Argon atoms without Coulombic interactions). Openmpi and
> > > networking between the nodes seem to work fine since there are not
> > > any problems with other software that run with MPI.
> > 
> > 
> > Can you run 2-processor MPI test program with that machine file?
> > 
> > Mark
> > 

"Unfortunately", other MPI programs run fine on 2 or more nodes. There
seems to be no problem with MPI.

> > > 
> > > In an attempt to find a solution, I have manually compiled and
> > > installed Gromacs 4.5.5 (with --enable-mpi) after having installed
> > > the latest version of openmpi and fftw3 and no error occurred
> > > during the installation. However, when trying to run on two
> > > different nodes exactly the same problem appears.
> > > 
> > > Have you any idea what might cause this situation?
> > > Thank you in advance! 
> > > 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20111208/f341f55f/attachment.html>


More information about the gromacs.org_gmx-users mailing list