[gmx-users] Re: Gromacs 4.5.4 on multi-node cluster
Dimitris Dellis
ntelll at gmail.com
Thu Dec 8 11:06:10 CET 2011
Hi.
This is openmpi related.
Probably you have active the virbr0 interface with IP 192.168.122.1 on
nodes.
Stop and disable the libvirtd (and probably libvirt-guests) service if
you don't need it.
Alternatively,
1. add --mca btl_tcp_if_exclude lo,virbr0 in mpirun flags
or
2. add in /home/gromacs/.Installed/openmpi/etc/openmpi-mca-params.conf
the following line :
btl_tcp_if_exclude = lo,virbr0
to exclude virbr0 from the interfaces list that openmpi can use for
communication.
(if virtbr1 etc. are present, add also in exclude list)
On 12/08/2011 11:44 AM, Nikos Papadimitriou wrote:
>
>>
>> email message attachment
>>
>>> -------- Forwarded Message --------
>>> *From*: Nikos Papadimitriou <nikpap at ipta.demokritos.gr
>>> <mailto:Nikos%20Papadimitriou%20%3cnikpap at ipta.demokritos.gr%3e>>
>>> *To*: gmx-users at gromacs.org <mailto:gmx-users at gromacs.org>
>>> *Subject*: [gmx-users] Gromacs 4.5.4 on multi-node cluster
>>> *Date*: Wed, 7 Dec 2011 16:26:46 +0200
>>>
>>> Dear All,
>>>
>>> I had been running Gromacs 4.0.7 on a 12-node cluster (Intel i7-920
>>> 4-cores) with OS Rocks 5.4.2. Recently, I have upgraded the cluster
>>> OS to Rocks 5.4.3 and I have installed Gromacs 4.5.4 from the Bio
>>> Roll repository. When running in parallel on the same node,
>>> everything works fine. However, when I am trying to run on more than
>>> one nodes the run stalls immediately with the following message:
>>>
>>> [gromacs at tornado Test]$ /home/gromacs/.Installed/openmpi/bin/mpirun
>>> -np 2 -machinefile machines
>>> /home/gromacs/.Installed/gromacs/bin/mdrun_mpi -s md_run.tpr -o
>>> md_traj.trr -c md_confs.gro -e md.edr -g md.log -v
>>> NNODES=2, MYRANK=0, HOSTNAME=compute-1-1.local
>>> NNODES=2, MYRANK=1, HOSTNAME=compute-1-2.local
>>> NODEID=0 argc=12
>>> NODEID=1 argc=12
>>>
>>> The mdrun_mpi thread seems to start in both nodes but the run does
>>> not go on and no file is produced. It seems that the nodes are
>>> waiting for some kind of communication between them. The problem
>>> occurs even for the simplest case (i.e. NVT simulation of 1000 Argon
>>> atoms without Coulombic interactions). Openmpi and networking
>>> between the nodes seem to work fine since there are not any problems
>>> with other software that run with MPI.
>>>
>>> In an attempt to find a solution, I have manually compiled and
>>> installed Gromacs 4.5.5 (with --enable-mpi) after having installed
>>> the latest version of openmpi and fftw3 and no error occurred during
>>> the installation. However, when trying to run on two different nodes
>>> exactly the same problem appears.
>>>
>>> Have you any idea what might cause this situation?
>>> Thank you in advance!
>> email message attachment
>>
>>> -------- Forwarded Message --------
>>> *From*: Mark Abraham <Mark.Abraham at anu.edu.au
>>> <mailto:Mark%20Abraham%20%3cMark.Abraham at anu.edu.au%3e>>
>>> *Reply-to*: "Discussion list for GROMACS users" <gmx-users at gromacs.org>
>>> *To*: Discussion list for GROMACS users <gmx-users at gromacs.org
>>> <mailto:Discussion%20list%20for%20GROMACS%20users%20%3cgmx-users at gromacs.org%3e>>
>>> *Subject*: [gmx-users] Gromacs 4.5.4 on multi-node cluster
>>> *Date*: Wed, 7 Dec 2011 16:53:49 +0200
>>>
>>> On 8/12/2011 1:26 AM, Nikos Papadimitriou wrote:
>>>> Dear All,
>>>>
>>>> I had been running Gromacs 4.0.7 on a 12-node cluster (Intel i7-920
>>>> 4-cores) with OS Rocks 5.4.2. Recently, I have upgraded the cluster
>>>> OS to Rocks 5.4.3 and I have installed Gromacs 4.5.4 from the Bio
>>>> Roll repository. When running in parallel on the same node,
>>>> everything works fine. However, when I am trying to run on more
>>>> than one nodes the run stalls immediately with the following message:
>>>>
>>>> [gromacs at tornado Test]$ /home/gromacs/.Installed/openmpi/bin/mpirun
>>>> -np 2 -machinefile machines
>>>> /home/gromacs/.Installed/gromacs/bin/mdrun_mpi -s md_run.tpr -o
>>>> md_traj.trr -c md_confs.gro -e md.edr -g md.log -v
>>>> NNODES=2, MYRANK=0, HOSTNAME=compute-1-1.local
>>>> NNODES=2, MYRANK=1, HOSTNAME=compute-1-2.local
>>>> NODEID=0 argc=12
>>>> NODEID=1 argc=12
>>>>
>>>> The mdrun_mpi thread seems to start in both nodes but the run does
>>>> not go on and no file is produced. It seems that the nodes are
>>>> waiting for some kind of communication between them. The problem
>>>> occurs even for the simplest case (i.e. NVT simulation of 1000
>>>> Argon atoms without Coulombic interactions). Openmpi and networking
>>>> between the nodes seem to work fine since there are not any
>>>> problems with other software that run with MPI.
>>>
>>> Can you run 2-processor MPI test program with that machine file?
>>>
>>> Mark
>>>
> "Unfortunately", other MPI programs run fine on 2 or more nodes. There
> seems to be no problem with MPI.
>>>>
>>>> In an attempt to find a solution, I have manually compiled and
>>>> installed Gromacs 4.5.5 (with --enable-mpi) after having installed
>>>> the latest version of openmpi and fftw3 and no error occurred
>>>> during the installation. However, when trying to run on two
>>>> different nodes exactly the same problem appears.
>>>>
>>>> Have you any idea what might cause this situation?
>>>> Thank you in advance!
>>>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20111208/da837831/attachment.html>
More information about the gromacs.org_gmx-users
mailing list