[gmx-users] Simulations on Leopard - slow and crashing, cannot compile with lam-mpi
Hadas Leonov
hleonov at cc.huji.ac.il
Wed Nov 21 11:24:35 CET 2007
Hi everybody,
I have installed Gromacs 3.3.2 on Mac OSX Leopard, but now it it
working 3 times slower than it did before, and in addition, it crashes
on simulations that take more than 40 minutes.
For example, I've ran a few benchmark runs and here are the results
for 4 processors on Mac-Pro:
d.villin:
Leopard performance: 13714 ps/day
old OS performance: 41143 ps/day.
gmx-benchmark : 48000 ps/day.
d.poly-ch2
Leopard performance: 8640 ps/day
old OS performance: 18000 ps/day
gmx-benchmark: 20571 ps/day
At first I thought that open-mpi was responsible for the slow speed,
but even when running on 1 CPU with mdrun, the performance of d.villin
were 3592 ps/day on Leopard, in comparison to 18106 ps/day on the old
OS.
As for crashes: I ran a position restraints of 0.5ns which usually
takes 2 hours on 2 CPUs. The prediction of the finish time was 6
hours, but it crashed after 40 minutes with the following errors:
---
step 23070, will finish at Tue Nov 20 23:18:28 2007
[tmdec2:69924] *** Process received signal ***
[tmdec2:69924] Signal: Segmentation fault (11)
[tmdec2:69924] Signal code: Address not mapped (1)
[tmdec2:69924] Failing at address: 0x49c78d52
[tmdec2:69925] *** Process received signal ***
[tmdec2:69925] Signal: Segmentation fault (11)
[tmdec2:69925] Signal code: Address not mapped (1)
[tmdec2:69925] Failing at address: 0x49aeac55
[tmdec2:69926] *** Process received signal ***
[tmdec2:69926] Signal: Segmentation fault (11)
[tmdec2:69926] Signal code: Address not mapped (1)
[tmdec2:69926] Failing at address: 0x48c74d8c
[tmdec2:69927] *** Process received signal ***
[tmdec2:69927] Signal: Segmentation fault (11)
[tmdec2:69927] Signal code: Address not mapped (1)
[tmdec2:69927] Failing at address: 0x49e5e700
[ 1] [0xbfffd678, 0x49aeac55] (-P-)
[ 2] (ompi_ddt_copy_content_same_ddt + 0x7d) [0xbfffd6e8, 0x006f562d]
[ 3] (ompi_ddt_sndrcv + 0x3bf) [0xbfffd748, 0x006fbebf]
[ 4] [ 1] [0xbfffd678, 0x48c74d8c] (-P-)
[ 2] [ 1] [0xbfffd678, 0x49c78d52] (-P-)
[ 2] (mca_coll_basic_alltoallv_intra + 0x28b) [0xbfffd7c8, 0x00a3a65b]
[ 5] (MPI_Alltoallv + 0x20a) [0xbfffd858, 0x0070056a]
[ 6] (ompi_ddt_copy_content_same_ddt + 0x7d) [0xbfffd6e8, 0x006f562d]
[ 3] (ompi_ddt_sndrcv + 0x3bf) [0xbfffd748, 0x006fbebf]
[ 4] [ 1] [0xbfffd678, 0x49e5e700] (-P-)
[ 2] (ompi_ddt_copy_content_same_ddt + 0x7d) [0xbfffd6e8, 0x006f562d]
[ 3] (ompi_ddt_sndrcv + 0x3bf) [0xbfffd748, 0x006fbebf]
[ 4] (ompi_ddt_copy_content_same_ddt + 0x7d) [0xbfffd6e8, 0x006f562d]
[ 3] (ompi_ddt_sndrcv + 0x3bf) [0xbfffd748, 0x006fbebf]
[ 4] (mca_coll_basic_alltoallv_intra + 0x28b) [0xbfffd7c8, 0x00a3a65b]
[ 5] (MPI_Alltoallv + 0x20a) [0xbfffd858, 0x0070056a]
[ 6] (mca_coll_basic_alltoallv_intra + 0x28b) [0xbfffd7c8, 0x00a3a65b]
[ 5] (MPI_Alltoallv + 0x20a) [0xbfffd858, 0x0070056a]
[ 6] (pmeredist + 0x4e2) [0xbfffd8d8, 0x0004836e]
[ 7] (do_pme + 0x494) [0xbfffda38, 0x0004d62b]
[ 8] (force + 0x7d9) (mca_coll_basic_alltoallv_intra + 0x28b)
[0xbfffd7c8, 0x00a3a65b]
[ 5] (MPI_Alltoallv + 0x20a) [0xbfffd858, 0x0070056a]
[ 6] [0xbfffdc88, 0x0002ee56]
[ 9] (do_force + 0x87a) [0xbfffdd78, 0x0005d652]
[10] (pmeredist + 0x4e2) [0xbfffd8d8, 0x0004836e]
[ 7] (do_pme + 0x494) [0xbfffda38, 0x0004d62b]
[ 8] (pmeredist + 0x4e2) [0xbfffd8d8, 0x0004836e]
[ 7] (do_md + 0x164f) [0xbfffe988, 0x0001666e]
[11] (mdrunner + 0xb04) [0xbfffeb08, 0x00014abe]
[12] (force + 0x7d9) [0xbfffdc88, 0x0002ee56]
[ 9] (do_force + 0x87a) [0xbfffdd78, 0x0005d652]
[10] (pmeredist + 0x4e2) [0xbfffd8d8, 0x0004836e]
[ 7] (do_pme + 0x494) [0xbfffda38, 0x0004d62b]
[ 8] (force + 0x7d9) (do_pme + 0x494) [0xbfffda38, 0x0004d62b]
[ 8] (force + 0x7d9) [0xbfffdc88, 0x0002ee56]
[ 9] (do_force + 0x87a) (main + 0x463) [0xbfffeb98, 0x00018c69]
[13] (start + 0x36) [0xbfffebbc, 0x0000216e]
[14] [0x00000000, 0x0000000e] (FP-)
[tmdec2:69925] *** End of error message ***
(do_md + 0x164f) [0xbfffe988, 0x0001666e]
[11] (mdrunner + 0xb04) [0xbfffeb08, 0x00014abe]
[12] [0xbfffdc88, 0x0002ee56]
[ 9] (do_force + 0x87a) [0xbfffdd78, 0x0005d652]
[10] (do_md + 0x164f) [0xbfffe988, 0x0001666e]
[11] [0xbfffdd78, 0x0005d652]
[10] (do_md + 0x164f) [0xbfffe988, 0x0001666e]
[11] (mdrunner + 0xb04) [0xbfffeb08, 0x00014abe]
[12] (main + 0x463) [0xbfffeb98, 0x00018c69]
[13] (start + 0x36) [0xbfffebbc, 0x0000216e]
[14] [0x00000000, 0x0000000e] (FP-)
[tmdec2:69926] *** End of error message ***
(mdrunner + 0xb04) [0xbfffeb08, 0x00014abe]
[12] (main + 0x463) [0xbfffeb98, 0x00018c69]
[13] (main + 0x463) [0xbfffeb98, 0x00018c69]
[13] (start + 0x36) [0xbfffebbc, 0x0000216e]
[14] [0x00000000, 0x0000000e] (FP-)
[tmdec2:69924] *** End of error message ***
(start + 0x36) [0xbfffebbc, 0x0000216e]
[14] [0x00000000, 0x0000000e] (FP-)
[tmdec2:69927] *** End of error message ***
[tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file /
SourceCache/openmpi/openmpi-5/openmpi/orte/mca/pls/base/
pls_base_orted_cmds.c at line 275
[tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file /
SourceCache/openmpi/openmpi-5/openmpi/orte/mca/pls/rsh/
pls_rsh_module.c at line 1164
[tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file /
SourceCache/openmpi/openmpi-5/openmpi/orte/mca/errmgr/hnp/errmgr_hnp.c
at line 90
mpirun noticed that job rank 1 with PID 69925 on node
tmdec2.ls.huji.ac.il exited on signal 11 (Segmentation fault).
[tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file /
SourceCache/openmpi/openmpi-5/openmpi/orte/mca/pls/base/
pls_base_orted_cmds.c at line 188
[tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file /
SourceCache/openmpi/openmpi-5/openmpi/orte/mca/pls/rsh/
pls_rsh_module.c at line 1196
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
1 additional process aborted (not shown)
---
It looks like GROMACS has troubles with open-mpi.
Before the installation of Leopard I was using lam-mpi, I couldn't use
it now because the compilation did not work unless I installed open-
mpi and used the ia32 disable flags in the configure script. (--
disable-ia32-3dnow --disable-ia32-3dnow, see previous post re that
error).
When I tried to compile GROMACS with lam-mpi installed, this is the
'make' error I got:
---
mpicc -I/sw/include -framework Accelerate -o grompp topio.o toppush.o
topcat.o topshake.o convparm.o tomorse.o sorting.o splitter.o
vsite_parm.o readir.o add_par.o topexcl.o toputil.o topdirs.o grompp.o
compute_io.o -L/sw/lib ../mdlib/.libs/libmd_mpi.a -L/usr/X11/lib ../
gmxlib/.libs/libgmx_mpi.a /usr/local/lib/libfftw3f.a -lm /sw/lib/
libXm.dylib /usr/X11/lib/libXt.6.0.0.dylib /usr/X11/lib/libSM.
6.0.0.dylib /usr/X11/lib/libICE.6.3.0.dylib /usr/X11/lib/libXp.
6.2.0.dylib /usr/X11/lib/libXext.6.4.0.dylib /usr/X11/lib/
libX11.6.2.0.dylib /usr/X11/lib/libXau.6.0.0.dylib /usr/X11/lib/
libXdmcp.6.0.0.dylib
Undefined symbols:
"_lam_mpi_byte", referenced from:
_lam_mpi_byte$non_lazy_ptr in libgmx_mpi.a(network.o)
"_lam_mpi_float", referenced from:
_lam_mpi_float$non_lazy_ptr in libgmx_mpi.a(network.o)
"_lam_mpi_comm_world", referenced from:
_lam_mpi_comm_world$non_lazy_ptr in libgmx_mpi.a(network.o)
ld: symbol(s) not found
collect2: ld returned 1 exit status
make[3]: *** [grompp] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all] Error 2
make: *** [all-recursive] Error 1
---
As you can realize - I can't do any simulations for now. any ideas? Is
this a GROMACS bug? if so, any chance it's fixed soon?
Thanks in advance,
Hadas Leonov.
hleonov at cc.huji.ac.il
Department of Biological Chemistry
Alexander Silberman institute of Life Sciences
The Hebrew University,
Jerusalem, Israel
More information about the gromacs.org_gmx-users
mailing list