[gmx-users] mdrun mpi segmentation fault in high load situation

Mark Abraham Mark.Abraham at anu.edu.au
Thu Dec 23 13:12:27 CET 2010


On 23/12/2010 10:01 PM, Wojtyczka, André wrote:
> Dear Gromacs Enthusiasts.
>
> I am experiencing problems with mdrun_mpi (4.5.3) on a Nehalem cluster.
>
> Problem:
> This runs fine:
> mpiexec -np 72 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
>
> This produces a segmentation fault:
> mpiexec -np 128 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr

Unless you know you need it, don't use -pd. DD will be faster and is 
probably better bug-tested too.

Mark

> So the only difference is the number of cores I am using.
>
> mdrun_mpi was compiled using the intel compiler 11.1.072 with my own fftw3 installation.
>
> While configuring and make mdrun / make install-mdrun no errors came
> up.
>
> Is there some issue with threading or mpi?
>
> If someone has a clue please give me a hint.
>
>
> integrator               = md
> dt                      = 0.004
> nsteps                  = 25000000
> nstxout                  = 0
> nstvout                  = 0
> nstlog                  = 250000
> nstenergy               = 250000
> nstxtcout               = 12500
> xtc_grps                 = protein
> energygrps               = protein non-protein
> nstlist                  = 2
> ns_type                  = grid
> rlist                    = 0.9
> coulombtype              = PME
> rcoulomb                 = 0.9
> fourierspacing           = 0.12
> pme_order                = 4
> ewald_rtol               = 1e-5
> rvdw                     = 0.9
> pbc                      = xyz
> periodic_molecules       = yes
> tcoupl                   = nose-hoover
> nsttcouple               = 1
> tc-grps                  = protein non-protein
> tau_t                    = 0.1 0.1
> ref_t                    = 310 310
> Pcoupl                   = no
> gen_vel                  = yes
> gen_temp                 = 310
> gen_seed                 = 173529
> constraints              = all-bonds
>
>
>
> Error:
> Getting Loaded...
> Reading file full031K_mdrun_ions.tpr, VERSION 4.5.3 (single precision)
> Loaded with Money
>
>
> NOTE: The load imbalance in PME FFT and solve is 48%.
>        For optimal PME load balancing
>        PME grid_x (144) and grid_y (144) should be divisible by #PME_nodes_x (128)
>        and PME grid_y (144) and grid_z (144) should be divisible by #PME_nodes_y (1)
>
>
> Step 0, time 0 (ps)
> PSIlogger: Child with rank 82 exited on signal 11: Segmentation fault
> PSIlogger: Child with rank 79 exited on signal 11: Segmentation fault
> PSIlogger: Child with rank 2 exited on signal 11: Segmentation fault
> PSIlogger: Child with rank 1 exited on signal 11: Segmentation fault
> PSIlogger: Child with rank 100 exited on signal 11: Segmentation fault
> PSIlogger: Child with rank 97 exited on signal 11: Segmentation fault
> PSIlogger: Child with rank 98 exited on signal 11: Segmentation fault
> PSIlogger: Child with rank 96 exited on signal 6: Aborted
> ...
>
> Ps, for now I don't care about the imbalanced PME load unless it's independent from my problem.
>
> Cheers
> André
>
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------




More information about the gromacs.org_gmx-users mailing list