[gmx-users] mdrun_mpi seg fault if N_atoms/cpu > 4096 ?
Atte Sillanpää
atte.sillanpaa at csc.fi
Wed Nov 9 17:46:46 CET 2005
Hi,
we have a system with 128 DPPC-molecules and a layer of water. All goes
well with version 3.2.1 if the number of atoms per cpu is less than 4096.
That is, we get a seg fault before any real md in the beginning:
Parallelized PME sum used.
Using the FFTW library (Fastest Fourier Transform in the West)
PARALLEL FFT DATA:
local_nx: 16 local_x_start: 0
local_ny_after_transpose: 16 local_y_start_after_transpose 0
total_local_size: 67584
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest, initial mass: 159751
There are: 4341 Atom
Removing pbc first time
Done rmpbc
Started mdrun on node 0 Wed Nov 9 17:27:13 2005
Initial temperature: 320.01 K
Step Time Lambda
0 0.00000 0.00000
However, with 8 cpu:s there's no problem. We get this on an Opteron
cluster running Rocks 3.2.0, Power4 and Sun Fire 25k. (Crash with 16384
atoms, but not with 16381) Also, there were no problems with Gromacs
version 3.0.3.
There should not be anything special in the *.mdp-file and it's parameters
seemed not to influence the behaviour. Hasty analysis from the Sun Fire
25k core gives out the following:
dbx -f mdrun_mpi core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.3'
in your .dbxrc
Reading mdrun_mpi
dbx: internal warning: writable memory segment 0xbcc00000[21331968] of
size 0 in core
core file header read successfully
Reading ld.so.1
Reading libmpi.so.1
...
Reading libdoor.so.1
Reading tcppm.so.2
t at 1 (l at 1) program terminated by signal SEGV (no mapping at the fault
address)
0x000aa43c: pbc_rvec_sub+0x001c: ld [%o0 + 8], %f8
Any ideas on how to proceed? I'm sure people have done bigger systems with
gmx per cpu?
Cheers,
Atte
More information about the gromacs.org_gmx-users
mailing list