[gmx-users] Fwd: Re[2]: gmx-users Digest, Vol 56, Issue 44

David van der Spoel spoel at xray.bmc.uu.se
Sun Dec 14 21:01:01 CET 2008


Vitaly Chaban wrote:
> To be complete, I compiled the mpi version via:
> 
> ./configure --enable-mpi -enable-float --prefix=/{PATH}/gmx402
> -program-suffix="_mpi"
> make
> make install
> 
> Was anything missed, perhaps?

Which MPI are you using? You might also try to recreate your tpr file.
> 
> Vitaly
> 
> 
> ===8<==============Original message text===============
> Unfortunately, the upgrade to 4.0.2 did not help.
> 
> Below is the same error for 8-nodes:
> 
> NNODES=8, MYRANK=0, HOSTNAME=merlin-3-30
> NNODES=8, MYRANK=1, HOSTNAME=merlin-3-30
> NNODES=8, MYRANK=3, HOSTNAME=merlin-3-34
> NNODES=8, MYRANK=2, HOSTNAME=merlin-3-34
> NNODES=8, MYRANK=4, HOSTNAME=merlin-4-5
> NNODES=8, MYRANK=5, HOSTNAME=merlin-4-5
> NNODES=8, MYRANK=7, HOSTNAME=merlin-3-23
> NNODES=8, MYRANK=6, HOSTNAME=merlin-3-23
> NODEID=1 argc=1
> NODEID=0 argc=1
>                          :-)  G  R  O  M  A  C  S  (-:
> 
> NODEID=2 argc=1
> NODEID=3 argc=1
> NODEID=6 argc=1
> NODEID=7 argc=1
> NODEID=5 argc=1
> NODEID=4 argc=1
>                Giant Rising Ordinary Mutants for A Clerical Setup
> 
>                             :-)  VERSION 4.0.2  (-:
> 
> Reading file topol.tpr, VERSION 3.3.1 (single precision)
> Note: tpx file_version 40, software version 58
> 
> NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp
> 
> Making 3D domain decomposition 2 x 2 x 2
> 
> WARNING: This run will generate roughly 5946 Mb of data
> 
> Dec 14 11:16:40 2008 21691 3 6.1 pServe: getMsgBuffer_() failed.
> ABORT - process 0: failure: Other MPI error
> Dec 14 11:16:45 2008 21691 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
> Dec 14 11:16:45 2008 21691 3 6.1 PAM: pWaitRtask(): ls_rwait/pjl_rwait() failed, Communication time out.
> Dec 14 11:16:50 2008 21691 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
> Dec 14 11:16:50 2008 21691 3 6.1 PAM: pWaitRtask(): ls_rwait/pjl_rwait() failed, Communication time out.
> Dec 14 11:16:50 2008 21691 3 6.1 pWaitAll(): NIOS is dead
> 
> 
> Any further ideas please?
> 
> Vitaly
> 
> 
> VC> David, Mark,
> 
> VC> Thank you!
> 
> VC> I will compile the latest version instead. Am I true starting the run
> VC> via:
> 
> VC> grompp -np 4
> VC> mpirun {PATH}/mdrun ?
> 
> VC> Is it not needed to pass any arguments to mdrun in this case?
> 
> VC> Thanks,
> VC> Vitaly
> 
> gurgo>> Hi,
> 
> gurgo>> I have got a problem running a parallel version of gromacs 4.0.
> 
> 
> gurgo>> While running the gromacs 4.0 with mpi the following error
> gurgo>> permanently appears:
> 
> gurgo>> NNODES=4, MYRANK=1, HOSTNAME=merlin-3-9
> gurgo>> NNODES=4, MYRANK=0, HOSTNAME=merlin-3-9
> gurgo>> NNODES=4, MYRANK=2, HOSTNAME=merlin-2-24
> gurgo>> NNODES=4, MYRANK=3, HOSTNAME=merlin-2-24
> gurgo>> NODEID=0 argc=1
> gurgo>> NODEID=1 argc=1
> gurgo>> NODEID=2 argc=1
> gurgo>> NODEID=3 argc=1
> gurgo>>                          :-)  G  R  O  M  A  C  S  (-:
> 
> gurgo>>                    Groningen Machine for Chemical Simulation
> 
> gurgo>>                            :-)  VERSION 4.0_rc2  (-:
> 
> 
> gurgo>> Reading file topol.tpr, VERSION 3.3.3 (single precision)
> gurgo>> Note: tpx file_version 40, software version 58
> 
> gurgo>> NOTE: The tpr file used for this simulation is in an old
> gurgo>> format, for less memory usage and possibly more
> gurgo>> performance create a new tpr file with an up to date version of grompp
> 
> gurgo>> Making 1D domain decomposition 1 x 1 x 4
> 
> gurgo>> Back Off! I just backed up ener.edr to ./#ener.edr.1#
> 
> gurgo>> WARNING: This run will generate roughly 5946 Mb of data
> 
> gurgo>> Dec 13 11:34:47 2008 32301 3 6.1 pServe: getMsgBuffer_() failed.
> gurgo>> Fatal error (code 0x94213a0f) in MPI_Scatterv():
> gurgo>> MPI_Scatterv(324): MPI_Scatterv(sbuf=0x8b8170, scnts=0x82b000,
> gurgo>> displs=0x82b010, MPI_BYTE, rbuf=0x8f3ce0,
> gurgo>> rcount=4680, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
> gurgo>> MPIC_Send(50): failure
> gurgo>> MPIC_Wait(306): failure
> gurgo>> MPIDI_CH3_Progress(421): [ch3:sock] failed to connnect to remote process -1:3
> gurgo>> MPIDU_Sock_wait(116): connection failure (set=0,sock=4)
> gurgo>> ABORT - process 0
> gurgo>> Dec 13 11:34:52 2008 32301 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
> gurgo>> Dec 13 11:34:52 2008 32301 3 6.1 PAM: pWaitRtask():
> gurgo>> ls_rwait/pjl_rwait() failed, Communication time out.
> gurgo>> Dec 13 11:34:57 2008 32301 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
> gurgo>> Dec 13 11:34:57 2008 32301 3 6.1 PAM: pWaitRtask():
> gurgo>> ls_rwait/pjl_rwait() failed, Communication time out.
> gurgo>> Dec 13 11:34:57 2008 32301 3 6.1 pWaitAll(): NIOS is dead
> 
> gurgo>> I used
> gurgo>> grompp -np 4 to create topol.tpr
> gurgo>> and then
> gurgo>> mpirun.lsf /home/gromacs4.0-mpi/bin/mdrun
> 
> gurgo>> I have no problems running 1-proc gromacs.
> 
> gurgo>> Does anybody have ideas how to fix this?
> 
> 
> ===8<===========End of original message text===========
> 
> 
> ------------------------------------------------------------------------
> 
> Subject:
> Re[2]: gmx-users Digest, Vol 56, Issue 44
> From:
> Vitaly Chaban <vvchaban at gmail.com>
> Date:
> Sun, 14 Dec 2008 21:26:33 +0200
> To:
> "gmx-users-request at gromacs.org" <gmx-users at gromacs.org>
> 
> To:
> "gmx-users-request at gromacs.org" <gmx-users at gromacs.org>
> CC:
> Bradley F Habenicht <thirdeye at u.washington.edu>
> 
> 
> Unfortunately, the upgrade to 4.0.2 did not help.
> 
> Below is the same error for 8-nodes:
> 
> NNODES=8, MYRANK=0, HOSTNAME=merlin-3-30
> NNODES=8, MYRANK=1, HOSTNAME=merlin-3-30
> NNODES=8, MYRANK=3, HOSTNAME=merlin-3-34
> NNODES=8, MYRANK=2, HOSTNAME=merlin-3-34
> NNODES=8, MYRANK=4, HOSTNAME=merlin-4-5
> NNODES=8, MYRANK=5, HOSTNAME=merlin-4-5
> NNODES=8, MYRANK=7, HOSTNAME=merlin-3-23
> NNODES=8, MYRANK=6, HOSTNAME=merlin-3-23
> NODEID=1 argc=1
> NODEID=0 argc=1
>                          :-)  G  R  O  M  A  C  S  (-:
> 
> NODEID=2 argc=1
> NODEID=3 argc=1
> NODEID=6 argc=1
> NODEID=7 argc=1
> NODEID=5 argc=1
> NODEID=4 argc=1
>                Giant Rising Ordinary Mutants for A Clerical Setup
> 
>                             :-)  VERSION 4.0.2  (-:
> 
> Reading file topol.tpr, VERSION 3.3.1 (single precision)
> Note: tpx file_version 40, software version 58
> 
> NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp
> 
> Making 3D domain decomposition 2 x 2 x 2
> 
> WARNING: This run will generate roughly 5946 Mb of data
> 
> Dec 14 11:16:40 2008 21691 3 6.1 pServe: getMsgBuffer_() failed.
> ABORT - process 0: failure: Other MPI error
> Dec 14 11:16:45 2008 21691 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
> Dec 14 11:16:45 2008 21691 3 6.1 PAM: pWaitRtask(): ls_rwait/pjl_rwait() failed, Communication time out.
> Dec 14 11:16:50 2008 21691 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
> Dec 14 11:16:50 2008 21691 3 6.1 PAM: pWaitRtask(): ls_rwait/pjl_rwait() failed, Communication time out.
> Dec 14 11:16:50 2008 21691 3 6.1 pWaitAll(): NIOS is dead
> 
> 
> Any further ideas please?
> 
> Vitaly
> 
> 
> VC> David, Mark,
> 
> VC> Thank you!
> 
> VC> I will compile the latest version instead. Am I true starting the run
> VC> via:
> 
> VC> grompp -np 4
> VC> mpirun {PATH}/mdrun ?
> 
> VC> Is it not needed to pass any arguments to mdrun in this case?
> 
> VC> Thanks,
> VC> Vitaly
> 
> gurgo>> Hi,
> 
> gurgo>> I have got a problem running a parallel version of gromacs 4.0.
> 
> 
> gurgo>> While running the gromacs 4.0 with mpi the following error
> gurgo>> permanently appears:
> 
> gurgo>> NNODES=4, MYRANK=1, HOSTNAME=merlin-3-9
> gurgo>> NNODES=4, MYRANK=0, HOSTNAME=merlin-3-9
> gurgo>> NNODES=4, MYRANK=2, HOSTNAME=merlin-2-24
> gurgo>> NNODES=4, MYRANK=3, HOSTNAME=merlin-2-24
> gurgo>> NODEID=0 argc=1
> gurgo>> NODEID=1 argc=1
> gurgo>> NODEID=2 argc=1
> gurgo>> NODEID=3 argc=1
> gurgo>>                          :-)  G  R  O  M  A  C  S  (-:
> 
> gurgo>>                    Groningen Machine for Chemical Simulation
> 
> gurgo>>                            :-)  VERSION 4.0_rc2  (-:
> 
> 
> gurgo>> Reading file topol.tpr, VERSION 3.3.3 (single precision)
> gurgo>> Note: tpx file_version 40, software version 58
> 
> gurgo>> NOTE: The tpr file used for this simulation is in an old
> gurgo>> format, for less memory usage and possibly more
> gurgo>> performance create a new tpr file with an up to date version of grompp
> 
> gurgo>> Making 1D domain decomposition 1 x 1 x 4
> 
> gurgo>> Back Off! I just backed up ener.edr to ./#ener.edr.1#
> 
> gurgo>> WARNING: This run will generate roughly 5946 Mb of data
> 
> gurgo>> Dec 13 11:34:47 2008 32301 3 6.1 pServe: getMsgBuffer_() failed.
> gurgo>> Fatal error (code 0x94213a0f) in MPI_Scatterv():
> gurgo>> MPI_Scatterv(324): MPI_Scatterv(sbuf=0x8b8170, scnts=0x82b000,
> gurgo>> displs=0x82b010, MPI_BYTE, rbuf=0x8f3ce0,
> gurgo>> rcount=4680, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
> gurgo>> MPIC_Send(50): failure
> gurgo>> MPIC_Wait(306): failure
> gurgo>> MPIDI_CH3_Progress(421): [ch3:sock] failed to connnect to remote process -1:3
> gurgo>> MPIDU_Sock_wait(116): connection failure (set=0,sock=4)
> gurgo>> ABORT - process 0
> gurgo>> Dec 13 11:34:52 2008 32301 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
> gurgo>> Dec 13 11:34:52 2008 32301 3 6.1 PAM: pWaitRtask():
> gurgo>> ls_rwait/pjl_rwait() failed, Communication time out.
> gurgo>> Dec 13 11:34:57 2008 32301 4 6.1 PAM: pjl_rwait: Didn't get all TS to report status.
> gurgo>> Dec 13 11:34:57 2008 32301 3 6.1 PAM: pWaitRtask():
> gurgo>> ls_rwait/pjl_rwait() failed, Communication time out.
> gurgo>> Dec 13 11:34:57 2008 32301 3 6.1 pWaitAll(): NIOS is dead
> 
> gurgo>> I used
> gurgo>> grompp -np 4 to create topol.tpr
> gurgo>> and then
> gurgo>> mpirun.lsf /home/gromacs4.0-mpi/bin/mdrun
> 
> gurgo>> I have no problems running 1-proc gromacs.
> 
> gurgo>> Does anybody have ideas how to fix this?
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php


-- 
David van der Spoel, Ph.D., Professor of Biology
Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205. Fax: +4618511755.
spoel at xray.bmc.uu.se	spoel at gromacs.org   http://folding.bmc.uu.se



More information about the gromacs.org_gmx-users mailing list