[gmx-users] Re: Gromacs 4.5.4 on multi-node cluster

Nikos Papadimitriou nikpap at ipta.demokritos.gr
Thu Dec 8 15:59:32 CET 2011



> > -------- Forwarded Message --------
> > From: Nikos Papadimitriou <nikpap at ipta.demokritos.gr>
> > To: gmx-users at gromacs.org
> > Subject: [gmx-users] Re: Gromacs 4.5.4 on multi-node cluster
> > Date: Thu, 8 Dec 2011 11:44:36 +0200
> > 
> > > > -------- Forwarded Message --------
> > > > From: Nikos Papadimitriou <nikpap at ipta.demokritos.gr>
> > > > To: gmx-users at gromacs.org
> > > > Subject: [gmx-users] Gromacs 4.5.4 on multi-node cluster
> > > > Date: Wed, 7 Dec 2011 16:26:46 +0200
> > > > 
> > > > Dear All,
> > > > 
> > > > I had been running Gromacs 4.0.7 on a 12-node cluster (Intel
> > > > i7-920 4-cores) with OS Rocks 5.4.2. Recently, I have upgraded
> > > > the cluster OS to Rocks 5.4.3 and I have installed Gromacs 4.5.4
> > > > from the Bio Roll repository. When running in parallel on the
> > > > same node, everything works fine. However, when I am trying to
> > > > run on more than one nodes the run stalls immediately with the
> > > > following message:
> > > > 
> > > > [gromacs at tornado
> > > > Test]$ /home/gromacs/.Installed/openmpi/bin/mpirun -np 2
> > > > -machinefile
> > > > machines /home/gromacs/.Installed/gromacs/bin/mdrun_mpi -s
> > > > md_run.tpr -o md_traj.trr -c md_confs.gro -e md.edr -g md.log -v
> > > > NNODES=2, MYRANK=0, HOSTNAME=compute-1-1.local
> > > > NNODES=2, MYRANK=1, HOSTNAME=compute-1-2.local
> > > > NODEID=0 argc=12
> > > > NODEID=1 argc=12
> > > > 
> > > > The mdrun_mpi thread seems to start in both nodes but the run
> > > > does not go on and no file is produced. It seems that the nodes
> > > > are waiting for some kind of communication between them. The
> > > > problem occurs even for the simplest case (i.e. NVT simulation
> > > > of 1000 Argon atoms without Coulombic interactions). Openmpi and
> > > > networking between the nodes seem to work fine since there are
> > > > not any problems with other software that run with MPI.
> > > > 
> > > > In an attempt to find a solution, I have manually compiled and
> > > > installed Gromacs 4.5.5 (with --enable-mpi) after having
> > > > installed the latest version of openmpi and fftw3 and no error
> > > > occurred during the installation. However, when trying to run on
> > > > two different nodes exactly the same problem appears.
> > > > 
> > > > Have you any idea what might cause this situation?
> > > > Thank you in advance!
> > > > -------- Forwarded Message --------
> > > > From: Mark Abraham <Mark.Abraham at anu.edu.au>
> > > > Reply-to: "Discussion list for GROMACS users"
> > > > <gmx-users at gromacs.org>
> > > > To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> > > > Subject: [gmx-users] Gromacs 4.5.4 on multi-node cluster
> > > > Date: Wed, 7 Dec 2011 16:53:49 +0200
> > > > 
> > > > On 8/12/2011 1:26 AM, Nikos Papadimitriou wrote: 
> > > > 
> > > > > Dear All,
> > > > > 
> > > > > I had been running Gromacs 4.0.7 on a 12-node cluster (Intel
> > > > > i7-920 4-cores) with OS Rocks 5.4.2. Recently, I have upgraded
> > > > > the cluster OS to Rocks 5.4.3 and I have installed Gromacs
> > > > > 4.5.4 from the Bio Roll repository. When running in parallel
> > > > > on the same node, everything works fine. However, when I am
> > > > > trying to run on more than one nodes the run stalls
> > > > > immediately with the following message:
> > > > > 
> > > > > [gromacs at tornado
> > > > > Test]$ /home/gromacs/.Installed/openmpi/bin/mpirun -np 2
> > > > > -machinefile
> > > > > machines /home/gromacs/.Installed/gromacs/bin/mdrun_mpi -s
> > > > > md_run.tpr -o md_traj.trr -c md_confs.gro -e md.edr -g md.log
> > > > > -v
> > > > > NNODES=2, MYRANK=0, HOSTNAME=compute-1-1.local
> > > > > NNODES=2, MYRANK=1, HOSTNAME=compute-1-2.local
> > > > > NODEID=0 argc=12
> > > > > NODEID=1 argc=12
> > > > > 
> > > > > The mdrun_mpi thread seems to start in both nodes but the run
> > > > > does not go on and no file is produced. It seems that the
> > > > > nodes are waiting for some kind of communication between them.
> > > > > The problem occurs even for the simplest case (i.e. NVT
> > > > > simulation of 1000 Argon atoms without Coulombic
> > > > > interactions). Openmpi and networking between the nodes seem
> > > > > to work fine since there are not any problems with other
> > > > > software that run with MPI.
> > > > 
> > > > 
> > > > Can you run 2-processor MPI test program with that machine file?
> > > > 
> > > > Mark
> > > > 
> > 
> > "Unfortunately", other MPI programs run fine on 2 or more nodes.
> > There seems to be no problem with MPI.
> > 
> > > > > 
> > > > > In an attempt to find a solution, I have manually compiled and
> > > > > installed Gromacs 4.5.5 (with --enable-mpi) after having
> > > > > installed the latest version of openmpi and fftw3 and no error
> > > > > occurred during the installation. However, when trying to run
> > > > > on two different nodes exactly the same problem appears.
> > > > > 
> > > > > Have you any idea what might cause this situation?
> > > > > Thank you in advance! 
> > 
> > -------- Forwarded Message --------
> > From: Dimitris Dellis <ntelll at gmail.com>
> > Reply-to: "Discussion list for GROMACS users"
> > <gmx-users at gromacs.org>
> > To: Nikos Papadimitriou <nikpap at ipta.demokritos.gr>, Discussion list
> > for GROMACS users <gmx-users at gromacs.org>
> > Subject: [gmx-users] Re: Gromacs 4.5.4 on multi-node cluster
> > Date: Thu, 8 Dec 2011 12:06:10 +0200
> > 
> > Hi.
> > This is openmpi related.
> > 
> > Probably you have active the virbr0 interface with IP 192.168.122.1
> > on nodes.
> > Stop and disable the libvirtd (and probably libvirt-guests) service
> > if you don't need it.
> > 
> > Alternatively, 
> > 1. add --mca btl_tcp_if_exclude lo,virbr0 in mpirun flags 
> > or
> > 2. add
> > in /home/gromacs/.Installed/openmpi/etc/openmpi-mca-params.conf the
> > following line :
> > btl_tcp_if_exclude = lo,virbr0
> > to exclude virbr0 from the interfaces list that openmpi can use for
> > communication.
> > 
> > (if virtbr1 etc. are present, add also in exclude list)
> > 
> > 

Thank you very much! It works 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20111208/0c3352f2/attachment.html>


More information about the gromacs.org_gmx-users mailing list