[gmx-users] Re[2]: gmx-users Digest, Vol 56, Issue 44

Vitaly Chaban vvchaban at gmail.com
Sun Dec 14 20:26:33 CET 2008


Unfortunately, the upgrade to 4.0.2 did not help.

Below is the same error for 8-nodes:

NNODES=8, MYRANK=0, HOSTNAME=merlin-3-30
NNODES=8, MYRANK=1, HOSTNAME=merlin-3-30
NNODES=8, MYRANK=3, HOSTNAME=merlin-3-34
NNODES=8, MYRANK=2, HOSTNAME=merlin-3-34
NNODES=8, MYRANK=4, HOSTNAME=merlin-4-5
NNODES=8, MYRANK=5, HOSTNAME=merlin-4-5
NNODES=8, MYRANK=7, HOSTNAME=merlin-3-23
NNODES=8, MYRANK=6, HOSTNAME=merlin-3-23
NODEID=1 argc=1
NODEID=0 argc=1
                         :-)  G  R  O  M  A  C  S  (-:

NODEID=2 argc=1
NODEID=3 argc=1
NODEID=6 argc=1
NODEID=7 argc=1
NODEID=5 argc=1
NODEID=4 argc=1
               Giant Rising Ordinary Mutants for A Clerical Setup

                            :-)  VERSION 4.0.2  (-:

Reading file topol.tpr, VERSION 3.3.1 (single precision)
Note: tpx file_version 40, software version 58

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 3D domain decomposition 2 x 2 x 2

WARNING: This run will generate roughly 5946 Mb of data

Dec 14 11:16:40 2008 21691 3 6.1 pServe: getMsgBuffer_() failed.
ABORT - process 0: failure: Other MPI error
Dec 14 11:16:45 2008 21691 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
Dec 14 11:16:45 2008 21691 3 6.1 PAM: pWaitRtask(): ls_rwait/pjl_rwait() failed, Communication time out.
Dec 14 11:16:50 2008 21691 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
Dec 14 11:16:50 2008 21691 3 6.1 PAM: pWaitRtask(): ls_rwait/pjl_rwait() failed, Communication time out.
Dec 14 11:16:50 2008 21691 3 6.1 pWaitAll(): NIOS is dead


Any further ideas please?

Vitaly


VC> David, Mark,

VC> Thank you!

VC> I will compile the latest version instead. Am I true starting the run
VC> via:

VC> grompp -np 4
VC> mpirun {PATH}/mdrun ?

VC> Is it not needed to pass any arguments to mdrun in this case?

VC> Thanks,
VC> Vitaly

gurgo>> Hi,

gurgo>> I have got a problem running a parallel version of gromacs 4.0.


gurgo>> While running the gromacs 4.0 with mpi the following error
gurgo>> permanently appears:

gurgo>> NNODES=4, MYRANK=1, HOSTNAME=merlin-3-9
gurgo>> NNODES=4, MYRANK=0, HOSTNAME=merlin-3-9
gurgo>> NNODES=4, MYRANK=2, HOSTNAME=merlin-2-24
gurgo>> NNODES=4, MYRANK=3, HOSTNAME=merlin-2-24
gurgo>> NODEID=0 argc=1
gurgo>> NODEID=1 argc=1
gurgo>> NODEID=2 argc=1
gurgo>> NODEID=3 argc=1
gurgo>>                          :-)  G  R  O  M  A  C  S  (-:

gurgo>>                    Groningen Machine for Chemical Simulation

gurgo>>                            :-)  VERSION 4.0_rc2  (-:


gurgo>> Reading file topol.tpr, VERSION 3.3.3 (single precision)
gurgo>> Note: tpx file_version 40, software version 58

gurgo>> NOTE: The tpr file used for this simulation is in an old
gurgo>> format, for less memory usage and possibly more
gurgo>> performance create a new tpr file with an up to date version of grompp

gurgo>> Making 1D domain decomposition 1 x 1 x 4

gurgo>> Back Off! I just backed up ener.edr to ./#ener.edr.1#

gurgo>> WARNING: This run will generate roughly 5946 Mb of data

gurgo>> Dec 13 11:34:47 2008 32301 3 6.1 pServe: getMsgBuffer_() failed.
gurgo>> Fatal error (code 0x94213a0f) in MPI_Scatterv():
gurgo>> MPI_Scatterv(324): MPI_Scatterv(sbuf=0x8b8170, scnts=0x82b000,
gurgo>> displs=0x82b010, MPI_BYTE, rbuf=0x8f3ce0,
gurgo>> rcount=4680, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
gurgo>> MPIC_Send(50): failure
gurgo>> MPIC_Wait(306): failure
gurgo>> MPIDI_CH3_Progress(421): [ch3:sock] failed to connnect to remote process -1:3
gurgo>> MPIDU_Sock_wait(116): connection failure (set=0,sock=4)
gurgo>> ABORT - process 0
gurgo>> Dec 13 11:34:52 2008 32301 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
gurgo>> Dec 13 11:34:52 2008 32301 3 6.1 PAM: pWaitRtask():
gurgo>> ls_rwait/pjl_rwait() failed, Communication time out.
gurgo>> Dec 13 11:34:57 2008 32301 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
gurgo>> Dec 13 11:34:57 2008 32301 3 6.1 PAM: pWaitRtask():
gurgo>> ls_rwait/pjl_rwait() failed, Communication time out.
gurgo>> Dec 13 11:34:57 2008 32301 3 6.1 pWaitAll(): NIOS is dead

gurgo>> I used
gurgo>> grompp -np 4 to create topol.tpr
gurgo>> and then
gurgo>> mpirun.lsf /home/gromacs4.0-mpi/bin/mdrun

gurgo>> I have no problems running 1-proc gromacs.

gurgo>> Does anybody have ideas how to fix this?






More information about the gromacs.org_gmx-users mailing list