AW: [gmx-users] mdrun mpi segmentation fault in high load situation
Wojtyczka, André
a.wojtyczka at fz-juelich.de
Thu Dec 23 17:28:46 CET 2010
>On 23/12/2010 10:01 PM, Wojtyczka, André wrote:
>> Dear Gromacs Enthusiasts.
>>
>> I am experiencing problems with mdrun_mpi (4.5.3) on a Nehalem cluster.
>>
>> Problem:
>> This runs fine:
>> mpiexec -np 72 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
>>
>> This produces a segmentation fault:
>> mpiexec -np 128 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
>
>Unless you know you need it, don't use -pd. DD will be faster and is
>probably better bug-tested too.
>
>Mark
Hi Mark
thanks for the push into that direction, but I am in the unfortunate situation where
I really need -pd because I have long bonds which is the reason why my large system
is decomposable just into a little number of domains.
>
>> So the only difference is the number of cores I am using.
>>
>> mdrun_mpi was compiled using the intel compiler 11.1.072 with my own fftw3 installation.
>>
>> While configuring and make mdrun / make install-mdrun no errors came
>> up.
>>
>> Is there some issue with threading or mpi?
>>
>> If someone has a clue please give me a hint.
>>
>>
>> integrator = md
>> dt = 0.004
>> nsteps = 25000000
>> nstxout = 0
>> nstvout = 0
>> nstlog = 250000
>> nstenergy = 250000
>> nstxtcout = 12500
>> xtc_grps = protein
>> energygrps = protein non-protein
>> nstlist = 2
>> ns_type = grid
>> rlist = 0.9
>> coulombtype = PME
>> rcoulomb = 0.9
>> fourierspacing = 0.12
>> pme_order = 4
>> ewald_rtol = 1e-5
>> rvdw = 0.9
>> pbc = xyz
>> periodic_molecules = yes
>> tcoupl = nose-hoover
>> nsttcouple = 1
>> tc-grps = protein non-protein
>> tau_t = 0.1 0.1
>> ref_t = 310 310
>> Pcoupl = no
>> gen_vel = yes
>> gen_temp = 310
>> gen_seed = 173529
>> constraints = all-bonds
>>
>>
>>
>> Error:
>> Getting Loaded...
>> Reading file full031K_mdrun_ions.tpr, VERSION 4.5.3 (single precision)
>> Loaded with Money
>>
>>
>> NOTE: The load imbalance in PME FFT and solve is 48%.
>> For optimal PME load balancing
>> PME grid_x (144) and grid_y (144) should be divisible by #PME_nodes_x (128)
>> and PME grid_y (144) and grid_z (144) should be divisible by #PME_nodes_y (1)
>>
>>
>> Step 0, time 0 (ps)
>> PSIlogger: Child with rank 82 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 79 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 2 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 1 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 100 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 97 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 98 exited on signal 11: Segmentation fault
>> PSIlogger: Child with rank 96 exited on signal 6: Aborted
>> ...
>>
>> Ps, for now I don't care about the imbalanced PME load unless it's independent from my problem.
>>
>> Cheers
>> André
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
More information about the gromacs.org_gmx-users
mailing list