AW: [gmx-users] mdrun mpi segmentation fault in high load situation

Wojtyczka, André a.wojtyczka at fz-juelich.de
Thu Dec 23 17:28:46 CET 2010


>On 23/12/2010 10:01 PM, Wojtyczka, André wrote:
>> Dear Gromacs Enthusiasts.
>>
>> I am experiencing problems with mdrun_mpi (4.5.3) on a Nehalem cluster.
>>
>> Problem:
>> This runs fine:
>> mpiexec -np 72 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
>>
>> This produces a segmentation fault:
>> mpiexec -np 128 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
>
>Unless you know you need it, don't use -pd. DD will be faster and is
>probably better bug-tested too.
>
>Mark

Hi Mark

thanks for the push into that direction, but I am in the unfortunate situation where
I really need -pd because I have long bonds which is the reason why my large system
is decomposable just into a little number of domains.


>
>> So the only difference is the number of cores I am using.
>>
>> mdrun_mpi was compiled using the intel compiler 11.1.072 with my own fftw3 installation.
>>
>> While configuring and make mdrun / make install-mdrun no errors came
>> up.
>>
>> Is there some issue with threading or mpi?
>>
>> If someone has a clue please give me a hint.
>>
>>
>> integrator               = md
>> dt                      = 0.004
>> nsteps                  = 25000000
>> nstxout                  = 0
>> nstvout                  = 0
>> nstlog                  = 250000
>> nstenergy               = 250000
>> nstxtcout               = 12500
>> xtc_grps                 = protein
>> energygrps               = protein non-protein
>> nstlist                  = 2
>> ns_type                  = grid
>> rlist                    = 0.9
>> coulombtype              = PME
>> rcoulomb                 = 0.9
>> fourierspacing           = 0.12
>> pme_order                = 4
>> ewald_rtol               = 1e-5
>> rvdw                     = 0.9
>> pbc                      = xyz
>> periodic_molecules       = yes
>> tcoupl                   = nose-hoover
>> nsttcouple               = 1
>> tc-grps                  = protein non-protein
>> tau_t                    = 0.1 0.1
>> ref_t                    = 310 310
>> Pcoupl                   = no
>> gen_vel                  = yes
>> gen_temp                 = 310
>> gen_seed                 = 173529
>> constraints              = all-bonds
>>
>>
>>
>> Error:
>> Getting Loaded...
>> Reading file full031K_mdrun_ions.tpr, VERSION 4.5.3 (single precision)
>> Loaded with Money
>>
>>
>> NOTE: The load imbalance in PME FFT and solve is 48%.
>>        For optimal PME load balancing
>>        PME grid_x (144) and grid_y (144) should be divisible by #PME_nodes_x (128)
>>        and PME grid_y (144) and grid_z (144) should be divisible by #PME_nodes_y (1)
>>
>>
>> Step 0, time 0 (ps)
>> PSIlogger: Child with rank 82 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 79 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 2 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 1 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 100 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 97 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 98 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 96 exited on signal 6: Aborted
>> ...
>>
>> Ps, for now I don't care about the imbalanced PME load unless it's independent from my problem.
>>
>> Cheers
>> André


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------



More information about the gromacs.org_gmx-users mailing list