[gmx-users] Re: problems with intel I7 (2.67 GHz)
Christof Koehler
christof.koehler at bccms.uni-bremen.de
Fri Feb 12 16:04:43 CET 2010
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello everybody.
I would like to chime in here also my problem might not be directly
related.
> The problem of Gromacs stalling on i7 when using multiple CPUs is a MPI
> problem. It is most likely caused by a shared memory bug in Open MPI
> that was fixed in the latest release (1.4.1).
>
> Switching to openmpi-1.4.1 solves the problem.
We are using openmpi-1.4.1 on Nehalem CPUs. With the current gromacs
4.0.7 I see reproducible segfaults when either
"numactl --cpunodebind=0 --membind=0 mpirun ..."
or
"mpirun --mca mpi_paffinity_alone"
is used, e.g.
/usr/local/x86_64.Linux/bin/mpirun -np 4 --mca mpi_paffinity_alone 1
/usr/local/stow/gromacs407/x86_64.Linux/bin/mdrun_407_mpi_d
[neuro36a:01728] *** Process received signal ***
[neuro36a:01728] Signal: Segmentation fault (11)
[neuro36a:01728] Signal code: Address not mapped (1)
[neuro36a:01728] Failing at address: 0x8
[neuro36a:01728] [ 0] [0x7ff120]
[neuro36a:01728] [ 1] [0x7f15a7]
[neuro36a:01728] [ 2] [0x7cadeb]
[neuro36a:01728] [ 3] [0x7cacd5]
[neuro36a:01728] [ 4] [0x6cc533]
[neuro36a:01728] [ 5] [0x6d704e]
[neuro36a:01728] [ 6] [0x4a1f9e]
[neuro36a:01728] [ 7] [0x49c6cc]
[neuro36a:01728] [ 8] [0x40e046]
[neuro36a:01728] [ 9] [0x800749]
[neuro36a:01728] [10] [0x4001b9]
[neuro36a:01728] *** End of error message ***
- --------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 1728 on node neuro36a exited
on signal 11 (Segmentation fault).
- --------------------------------------------------------------------------
mpirun -V
mpirun (Open MPI) 1.4.1
Everything works as expected if no core binding is used at all. The
serial version built the same way only without the --enable-mpi switch
shows no problems if used with numactl.
The numactl/mpirun combination although a bit unusual works fine with
other codes, e.g. cpmd, vasp ..., as does the usual "mpirun --mca
mpi_paffinity_alone" switch.
Since we are using CPU binding to partition an eight core node into two
SGE slots with 4 cores each this situation is not optimal.
I will try openmpi 1.4.2 as soon as it has been released, though.
Best Regards
Christof Köhler
- --
Dr. rer. nat. Christof Köhler email: c.koehler at bccms.uni-bremen.de
Universitaet Bremen/ BCCMS phone: +49-(0)421-218-2486
Am Fallturm 1/ TAB/ Raum 3.12 fax: +49-(0)421-218-4764
28359 Bremen
PGP:
http://www.bccms.uni-bremen.de/fileadmin/BCCMS/pgp_keys/ChristofKoehler_UniBremen.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFLdW4LRtHb9dSZpXwRAmAZAKCndmiG1VF1zFcWX6gNmkg5nFNgfwCfSyl2
bTORJsG7XkFZ8PghgSQFts0=
=DRvQ
-----END PGP SIGNATURE-----
More information about the gromacs.org_gmx-users
mailing list