[gmx-users] mdrun CVS version crashes instantly when run across nodes in parallel

Carsten Kutzner ckutzne at gwdg.de
Tue Jan 22 14:20:57 CET 2008


Hi Erik,

have you tried the small MPI test program to which your link points?
> http://www.open-mpi.org/community/lists/users/2006/04/0978.php

This would help to figure out whether the problem is on the Gromacs or
on the MPI side. From the result that Gromacs 3.3.x works on your
cluster, unfortunately nothing can be concluded, since 3.3 does not use
MPI_Allreduce call.

Just a few weeks ago our computing center detected a bug in the 64-bit
version of MPI_Reduce in the MVAPICH/MVAPICH2 libraries. So there might
be a similar problem here ...

Carsten


Erik Brandt wrote:
> Hello Gromacs users.
> 
> In the CVS version, I experience that mdrun crashes instantly when run
> in parallel across nodes (for any simulation system). The cluster
> consists of 8 nodes with Intel 6600 Quad-Core processors. As long as a
> job is run on a single node (using 1,2 or 4 CPU:s) everything works
> fine but when trying to run on several nodes mdrun crashes directly
> with the following error message (no output or log files are written to
> disk):
> 
>> Getting Loaded...
>> Reading file topol.tpr, VERSION 3.3.99_development_20071104 (single
> precision)
>> Loaded with Money
>>
>> [warhol8:29695] *** An error occurred in MPI_Allreduce
>> [warhol8:29695] *** on communicator MPI_COMM_WORLD
>> [warhol8:29695] *** MPI_ERR_COMM: invalid communicator
>> [warhol8:29695] *** MPI_ERRORS_ARE_FATAL (goodbye)
> 
> For the 1024 DPPC benchmark system the following two commands were
> used to start the simulation (default names on input files):
> 
>> /opt/gromacs/cvs/bin/grompp
>> /opt/openmp/1.2.4/bin/mpirun --hostfile hostfile
> /opt/gromacs/cvs/bin/mdrun_mpi -v -dd 2 2 2
> 
> where hostfile contains two specific nodes with 4 slots each.
> 
> The OS is Ubuntu 7.10 x86_64 on all nodes. mdrun_mpi is compiled with
> OpenMPI 1.2.4 but I have also tried with LAM/MPI 7.1.2 and it crashes
> in the same manner with an identical error message. Furthermore I have
> tried a static compilation on another cluster (Intel Xeon EM64T
> Processors) and copied the files to our cluster with the same
> result. I have searched the web for this error and there are some
> suggestions that this may be related to  64 bit architecture, see e.g.
> 
> http://www.open-mpi.org/community/lists/users/2006/04/0978.php
> 
> The MPI installation on the cluster works for the 3.3.2 version of
> Gromacs and also for some simple test programs for MPI such as nodes
> writing out their name and rank.
> 
> Does anyone have any ideas on the origins of these crashes and/or
> suggestions on how to resolve them?
> 
> Regards
> Erik Brandt
> 
> Ph.D. Student
> Theoretical Physics, KTH, Stockholm, Sweden
> 
> -- 
> Erik Brandt <erikb at theophys.kth.se <mailto:erikb at theophys.kth.se>>
> KTH
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php

-- 
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics Department
Am Fassberg 11
37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/research/dep/grubmueller/
http://www.gwdg.de/~ckutzne



More information about the gromacs.org_gmx-users mailing list