[gmx-users] mdrun_mpi seg fault if N_atoms/cpu > 4096 ?

Wed Nov 9 17:46:46 CET 2005

Hi,

we have a system with 128 DPPC-molecules and a layer of water. All goes 
well with version 3.2.1 if the number of atoms per cpu is less than 4096. 
That is, we get a seg fault before any real md in the beginning:

Parallelized PME sum used.
Using the FFTW library (Fastest Fourier Transform in the West)
PARALLEL FFT DATA:
    local_nx:                  16  local_x_start:                   0
    local_ny_after_transpose:  16  local_y_start_after_transpose    0
    total_local_size:         67584
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
   0:  rest, initial mass: 159751
There are: 4341 Atom
Removing pbc first time
Done rmpbc
Started mdrun on node 0 Wed Nov  9 17:27:13 2005
Initial temperature: 320.01 K
            Step           Time         Lambda
               0        0.00000        0.00000

However, with 8 cpu:s there's no problem. We get this on an Opteron 
cluster running Rocks 3.2.0, Power4 and Sun Fire 25k. (Crash with 16384 
atoms, but not with 16381) Also, there were no problems with Gromacs 
version 3.0.3.

There should not be anything special in the *.mdp-file and it's parameters 
seemed not to influence the behaviour. Hasty analysis from the Sun Fire 
25k core gives out the following:

dbx -f mdrun_mpi core

For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.3' 
in your .dbxrc
Reading mdrun_mpi
dbx: internal warning: writable memory segment 0xbcc00000[21331968] of 
size 0 in core
core file header read successfully
Reading ld.so.1
Reading libmpi.so.1
...
Reading libdoor.so.1
Reading tcppm.so.2
t at 1 (l at 1) program terminated by signal SEGV (no mapping at the fault 
address)
0x000aa43c: pbc_rvec_sub+0x001c:        ld       [%o0 + 8], %f8

Any ideas on how to proceed? I'm sure people have done bigger systems with 
gmx per cpu?

Cheers,

Atte