AW: [gmx-users] mdrun mpi segmentation fault in high load situation
Mark Abraham
Mark.Abraham at anu.edu.au
Thu Dec 23 22:46:15 CET 2010
On 24/12/2010 8:34 AM, Mark Abraham wrote:
> On 24/12/2010 3:28 AM, Wojtyczka, André wrote:
>>> On 23/12/2010 10:01 PM, Wojtyczka, André wrote:
>>>> Dear Gromacs Enthusiasts.
>>>>
>>>> I am experiencing problems with mdrun_mpi (4.5.3) on a Nehalem
>>>> cluster.
>>>>
>>>> Problem:
>>>> This runs fine:
>>>> mpiexec -np 72 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
>>>>
>>>> This produces a segmentation fault:
>>>> mpiexec -np 128 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
>>> Unless you know you need it, don't use -pd. DD will be faster and is
>>> probably better bug-tested too.
>>>
>>> Mark
>> Hi Mark
>>
>> thanks for the push into that direction, but I am in the unfortunate
>> situation where
>> I really need -pd because I have long bonds which is the reason why
>> my large system
>> is decomposable just into a little number of domains.
>
> I'm not sure that PD has any advantage here. From memory it has to
> create a 128x1x1 grid, and you can direct that with DD also.
See mdrun -h -hidden for -dd.
Mark
> The contents of your .log file will be far more helpful than stdout in
> diagnosing what condition led to the problem.
>
> Mark
>
>>>> So the only difference is the number of cores I am using.
>>>>
>>>> mdrun_mpi was compiled using the intel compiler 11.1.072 with my
>>>> own fftw3 installation.
>>>>
>>>> While configuring and make mdrun / make install-mdrun no errors came
>>>> up.
>>>>
>>>> Is there some issue with threading or mpi?
>>>>
>>>> If someone has a clue please give me a hint.
>>>>
>>>>
>>>> integrator = md
>>>> dt = 0.004
>>>> nsteps = 25000000
>>>> nstxout = 0
>>>> nstvout = 0
>>>> nstlog = 250000
>>>> nstenergy = 250000
>>>> nstxtcout = 12500
>>>> xtc_grps = protein
>>>> energygrps = protein non-protein
>>>> nstlist = 2
>>>> ns_type = grid
>>>> rlist = 0.9
>>>> coulombtype = PME
>>>> rcoulomb = 0.9
>>>> fourierspacing = 0.12
>>>> pme_order = 4
>>>> ewald_rtol = 1e-5
>>>> rvdw = 0.9
>>>> pbc = xyz
>>>> periodic_molecules = yes
>>>> tcoupl = nose-hoover
>>>> nsttcouple = 1
>>>> tc-grps = protein non-protein
>>>> tau_t = 0.1 0.1
>>>> ref_t = 310 310
>>>> Pcoupl = no
>>>> gen_vel = yes
>>>> gen_temp = 310
>>>> gen_seed = 173529
>>>> constraints = all-bonds
>>>>
>>>>
>>>>
>>>> Error:
>>>> Getting Loaded...
>>>> Reading file full031K_mdrun_ions.tpr, VERSION 4.5.3 (single precision)
>>>> Loaded with Money
>>>>
>>>>
>>>> NOTE: The load imbalance in PME FFT and solve is 48%.
>>>> For optimal PME load balancing
>>>> PME grid_x (144) and grid_y (144) should be divisible by
>>>> #PME_nodes_x (128)
>>>> and PME grid_y (144) and grid_z (144) should be divisible
>>>> by #PME_nodes_y (1)
>>>>
>>>>
>>>> Step 0, time 0 (ps)
>>>> PSIlogger: Child with rank 82 exited on signal 11: Segmentation fault
>>>> PSIlogger: Child with rank 79 exited on signal 11: Segmentation fault
>>>> PSIlogger: Child with rank 2 exited on signal 11: Segmentation fault
>>>> PSIlogger: Child with rank 1 exited on signal 11: Segmentation fault
>>>> PSIlogger: Child with rank 100 exited on signal 11: Segmentation fault
>>>> PSIlogger: Child with rank 97 exited on signal 11: Segmentation fault
>>>> PSIlogger: Child with rank 98 exited on signal 11: Segmentation fault
>>>> PSIlogger: Child with rank 96 exited on signal 6: Aborted
>>>> ...
>>>>
>>>> Ps, for now I don't care about the imbalanced PME load unless it's
>>>> independent from my problem.
>>>>
>>>> Cheers
>>>> André
>>
>> ------------------------------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------------------------------
>>
>> Forschungszentrum Juelich GmbH
>> 52425 Juelich
>> Sitz der Gesellschaft: Juelich
>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
>> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
>> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>> Prof. Dr. Sebastian M. Schmidt
>> ------------------------------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------------------------------
>>
>
More information about the gromacs.org_gmx-users
mailing list