[gmx-users] GROMACS Parallel Runs

Sunny ge_sunny at hotmail.com
Mon Oct 2 10:37:36 CEST 2006


>From: David van der Spoel <spoel at xray.bmc.uu.se>
>Reply-To: Discussion list for GROMACS users <gmx-users at gromacs.org>
>To: Discussion list for GROMACS users <gmx-users at gromacs.org>
>Subject: Re: [gmx-users] GROMACS Parallel Runs
>Date: Sun, 01 Oct 2006 19:58:48 +0200
>
>Sunny wrote:
>>Hi,
>>
>>I am using GROMACS 3.3.1 parallel runs on an AIX supercomputing system. My 
>>simulation can successfully run on 16 and 32 CPUs (as well as below 16 
>>CPUs). When running on 64 CPUs, however, segmentation fault occurs in 
>>multiple tasks from very beginning of the simulation. I'd like know what 
>>causes the failure and whether there is any solution to fix the failure.
>>
>
>please supply more details, like system size, PME details etc.
>
>
>>Thanks,
>>
>>Sunny
>David.
>

Hi all,

Thanks for your replies. The followings are the full configuration info of 
my simulatione found in md0.log and the error message given in the .err. I'm 
sorry for the tedious list.

Many thanks,

Sunny

CONFIGURATION INFO:

CPU=  0, lastcg=  298, targetcg= 7732, myshift=   23
CPU=  1, lastcg=  633, targetcg= 8066, myshift=   23
CPU=  2, lastcg=  970, targetcg= 8404, myshift=   23
CPU=  3, lastcg= 1298, targetcg= 8732, myshift=   23
CPU=  4, lastcg= 1629, targetcg= 9062, myshift=   24
CPU=  5, lastcg= 1959, targetcg= 9392, myshift=   25
CPU=  6, lastcg= 2296, targetcg= 9730, myshift=   26
CPU=  7, lastcg= 2624, targetcg=10058, myshift=   27
CPU=  8, lastcg= 2955, targetcg=10388, myshift=   28
CPU=  9, lastcg= 3285, targetcg=10718, myshift=   29
CPU= 10, lastcg= 3622, targetcg=11056, myshift=   30
CPU= 11, lastcg= 3950, targetcg=11384, myshift=   31
CPU= 12, lastcg= 4281, targetcg=11714, myshift=   32
CPU= 13, lastcg= 4611, targetcg=12044, myshift=   33
CPU= 14, lastcg= 4948, targetcg=12382, myshift=   34
CPU= 15, lastcg= 5276, targetcg=12710, myshift=   35
CPU= 16, lastcg= 5607, targetcg=13040, myshift=   36
CPU= 17, lastcg= 5937, targetcg=13370, myshift=   37
CPU= 18, lastcg= 6274, targetcg=13708, myshift=   38
CPU= 19, lastcg= 6602, targetcg=14036, myshift=   39
CPU= 20, lastcg= 6933, targetcg=14366, myshift=   40
CPU= 21, lastcg= 7263, targetcg=14696, myshift=   41
CPU= 22, lastcg= 7600, targetcg=  168, myshift=   42
CPU= 23, lastcg= 7928, targetcg=  496, myshift=   42
CPU= 24, lastcg= 8259, targetcg=  826, myshift=   42
CPU= 25, lastcg= 8589, targetcg= 1156, myshift=   42
CPU= 26, lastcg= 8840, targetcg= 1408, myshift=   42
CPU= 27, lastcg= 9003, targetcg= 1570, myshift=   41
CPU= 28, lastcg= 9166, targetcg= 1734, myshift=   41
CPU= 29, lastcg= 9329, targetcg= 1896, myshift=   40
CPU= 30, lastcg= 9492, targetcg= 2060, myshift=   40
CPU= 31, lastcg= 9655, targetcg= 2222, myshift=   39
CPU= 32, lastcg= 9818, targetcg= 2386, myshift=   39
CPU= 33, lastcg= 9981, targetcg= 2548, myshift=   38
CPU= 34, lastcg=10144, targetcg= 2712, myshift=   38
CPU= 35, lastcg=10307, targetcg= 2874, myshift=   37
CPU= 36, lastcg=10470, targetcg= 3038, myshift=   37
CPU= 37, lastcg=10633, targetcg= 3200, myshift=   36
CPU= 38, lastcg=10796, targetcg= 3364, myshift=   36
CPU= 39, lastcg=10959, targetcg= 3526, myshift=   35
CPU= 40, lastcg=11122, targetcg= 3690, myshift=   35
CPU= 41, lastcg=11285, targetcg= 3852, myshift=   34
CPU= 42, lastcg=11448, targetcg= 4016, myshift=   34
CPU= 43, lastcg=11611, targetcg= 4178, myshift=   33
CPU= 44, lastcg=11774, targetcg= 4342, myshift=   33
CPU= 45, lastcg=11937, targetcg= 4504, myshift=   32
CPU= 46, lastcg=12100, targetcg= 4668, myshift=   32
CPU= 47, lastcg=12263, targetcg= 4830, myshift=   31
CPU= 48, lastcg=12426, targetcg= 4994, myshift=   31
CPU= 49, lastcg=12589, targetcg= 5156, myshift=   30
CPU= 50, lastcg=12752, targetcg= 5320, myshift=   30
CPU= 51, lastcg=12915, targetcg= 5482, myshift=   29
CPU= 52, lastcg=13078, targetcg= 5646, myshift=   29
CPU= 53, lastcg=13240, targetcg= 5808, myshift=   28
CPU= 54, lastcg=13403, targetcg= 5970, myshift=   28
CPU= 55, lastcg=13565, targetcg= 6132, myshift=   27
CPU= 56, lastcg=13728, targetcg= 6296, myshift=   27
CPU= 57, lastcg=13890, targetcg= 6458, myshift=   26
CPU= 58, lastcg=14053, targetcg= 6620, myshift=   26
CPU= 59, lastcg=14215, targetcg= 6782, myshift=   25
CPU= 60, lastcg=14378, targetcg= 6946, myshift=   25
CPU= 61, lastcg=14540, targetcg= 7108, myshift=   24
CPU= 62, lastcg=14703, targetcg= 7270, myshift=   24
CPU= 63, lastcg=14865, targetcg= 7432, myshift=   23
nsb->shift =  42, nsb->bshift=  0
Listing Scalars
nsb->nodeid:         0
nsb->nnodes:     64
nsb->cgtotal: 14866
nsb->natoms:  31242
nsb->shift:      42
nsb->bshift:      0
Nodeid   index  homenr  cgload  workload
     0       0     488     299       299
     1     488     491     634       634
     2     979     488     971       971
     3    1467     488    1299      1299
     4    1955     488    1630      1630
     5    2443     486    1960      1960
     6    2929     488    2297      2297
     7    3417     488    2625      2625
     8    3905     488    2956      2956
     9    4393     486    3286      3286
    10    4879     488    3623      3623
    11    5367     488    3951      3951
    12    5855     488    4282      4282
    13    6343     486    4612      4612
    14    6829     488    4949      4949
    15    7317     488    5277      5277
    16    7805     488    5608      5608
    17    8293     486    5938      5938
    18    8779     488    6275      6275
    19    9267     488    6603      6603
    20    9755     488    6934      6934
    21   10243     486    7264      7264
    22   10729     488    7601      7601
    23   11217     488    7929      7929
    24   11705     488    8260      8260
    25   12193     486    8590      8590
    26   12679     488    8841      8841
    27   13167     489    9004      9004
    28   13656     489    9167      9167
    29   14145     489    9330      9330
    30   14634     489    9493      9493
    31   15123     489    9656      9656
    32   15612     489    9819      9819
    33   16101     489    9982      9982
    34   16590     489   10145     10145
    35   17079     489   10308     10308
    36   17568     489   10471     10471
    37   18057     489   10634     10634
    38   18546     489   10797     10797
    39   19035     489   10960     10960
    40   19524     489   11123     11123
    41   20013     489   11286     11286
    42   20502     489   11449     11449
    43   20991     489   11612     11612
    44   21480     489   11775     11775
    45   21969     489   11938     11938
    46   22458     489   12101     12101
    47   22947     489   12264     12264
    48   23436     489   12427     12427
    49   23925     489   12590     12590
    50   24414     489   12753     12753
    51   24903     489   12916     12916
    52   25392     489   13079     13079
    53   25881     486   13241     13241
    54   26367     489   13404     13404
    55   26856     486   13566     13566
    56   27342     489   13729     13729
    57   27831     486   13891     13891
    58   28317     489   14054     14054
    59   28806     486   14216     14216
    60   29292     489   14379     14379
    61   29781     486   14541     14541
    62   30267     489   14704     14704
    63   30756     486   14866     14866

parameters of the run:
   integrator           = md
   nsteps               = 100000
   init_step            = 0
   ns_type              = Grid
   nstlist              = 10
   ndelta               = 2
   bDomDecomp           = FALSE
   decomp_dir           = 0
   nstcomm              = 1
   comm_mode            = Linear
   nstcheckpoint        = 1000
   nstlog               = 100
   nstxout              = 1000
   nstvout              = 25000
   nstfout              = 0
   nstenergy            = 100
   nstxtcout            = 500
   init_t               = 0
   delta_t              = 0.002
   xtcprec              = 1000
   nkx                  = 64
   nky                  = 128
   nkz                  = 64
   pme_order            = 4
   ewald_rtol           = 1e-05
   ewald_geometry       = 0
   epsilon_surface      = 0
   optimize_fft         = TRUE
   ePBC                 = xyz
   bUncStart            = FALSE
   bShakeSOR            = FALSE
   etc                  = Nose-Hoover
   epc                  = Parrinello-Rahman
   epctype              = Isotropic
   tau_p                = 5
   ref_p (3x3):
      ref_p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
      ref_p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
      ref_p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
   compress (3x3):
      compress[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
      compress[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
      compress[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
   andersen_seed        = 815131
   rlist                = 1
   coulombtype          = PME
   rcoulomb_switch      = 0
   rcoulomb             = 1
   vdwtype              = Cut-off
   rvdw_switch          = 0
   rvdw                 = 1
   epsilon_r            = 1
   epsilon_rf           = 1
   tabext               = 1
   gb_algorithm         = Still
   nstgbradii           = 1
   rgbradii             = 2
   gb_saltconc          = 0
   implicit_solvent     = No
   DispCorr             = No
   fudgeQQ              = 1
   free_energy          = no
   init_lambda          = 0
   sc_alpha             = 0
   sc_power             = 0
   sc_sigma             = 0.3
   delta_lambda         = 0
   disre_weighting      = Conservative
   disre_mixed          = FALSE
   dr_fc                = 1000
   dr_tau               = 0
   nstdisreout          = 100
   orires_fc            = 0
   orires_tau           = 0
   nstorireout          = 100
   dihre-fc             = 1000
   dihre-tau            = 0
   nstdihreout          = 100
   em_stepsize          = 0.001
   em_tol               = 1e-06
   niter                = 1000
   fc_stepsize          = 0
   nstcgsteep           = 10000
   nbfgscorr            = 10
   ConstAlg             = Lincs
   shake_tol            = 0.0001
   lincs_order          = 4
   lincs_warnangle      = 30
   lincs_iter           = 1
   bd_fric              = 0
   ld_seed              = 1993
   cos_accel            = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   userint1             = 0
   userint2             = 0
   userint3             = 0
   userint4             = 0
   userreal1            = 0
   userreal2            = 0
   userreal3            = 0
   userreal4            = 0
grpopts:
   nrdf:	       75399
   ref_t:	         300
   tau_t:	         0.5
anneal:		          No
ann_npoints:	           0
   acc:	           0           0           0
   nfreeze:           N           N           N
   energygrp_flags[  0]: 0 0 0
   energygrp_flags[  1]: 0 0 0
   energygrp_flags[  2]: 0 0 0
   efield-x:
      n = 0
   efield-xt:
      n = 0
   efield-y:
      n = 0
   efield-yt:
      n = 0
   efield-z:
      n = 0
   efield-zt:
      n = 0
   bQMMM                = FALSE
   QMconstraints        = 0
   QMMMscheme           = 0
   scalefactor          = 1
qm_opts:
   ngQM                 = 0
Max number of graph edges per atom is 4
Table routines are used for coulomb: TRUE
Table routines are used for vdw:     FALSE
Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Cut-off's:   NS: 1   Coulomb: 1   LJ: 1
System total charge: 0.000
Generated table with 1000 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1000 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1000 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Enabling SPC water optimization for 6108 molecules.

Will do PME sum in reciprocal space.
[End]
--------------------------------------------------------------------------
ERROR MESSAGE:

Reading file topol.tpr, VERSION 3.3.1 (single precision)

Back Off! I just backed up ener.edr to ./#ener.edr.1#
starting mdrun 'sivdppc'
100000 steps,    200.0 ps.


Back Off! I just backed up traj.trr to ./#traj.trr.1#

Back Off! I just backed up traj.xtc to ./#traj.xtc.1#

Back Off! I just backed up step-1.pdb to ./#step-1.pdb.1#
ERROR: 0031-250  task 62: Segmentation fault
ERROR: 0031-250  task 54: Segmentation fault
ERROR: 0031-250  task 58: Segmentation fault
ERROR: 0031-250  task 50: Segmentation fault
ERROR: 0031-250  task 51: Segmentation fault

Back Off! I just backed up step0.pdb to ./#step0.pdb.1#
ERROR: 0031-250  task 19: Segmentation fault
ERROR: 0031-250  task 28: Segmentation fault
ERROR: 0031-250  task 49: Segmentation fault
ERROR: 0031-250  task 17: Segmentation fault
ERROR: 0031-250  task 20: Segmentation fault
ERROR: 0031-250  task 23: Segmentation fault
ERROR: 0031-250  task 26: Segmentation fault
ERROR: 0031-250  task 27: Segmentation fault
ERROR: 0031-250  task 31: Segmentation fault
Wrote pdb files with previous and current coordinates
ERROR: 0031-250  task 52: Segmentation fault
ERROR: 0031-250  task 18: Segmentation fault
ERROR: 0031-250  task 60: Segmentation fault
ERROR: 0031-250  task 24: Segmentation fault
ERROR: 0031-250  task 16: Segmentation fault
ERROR: 0031-250  task 30: Segmentation fault
ERROR: 0031-250  task 21: Segmentation fault
ERROR: 0031-250  task 14: Segmentation fault
ERROR: 0031-250  task 48: Segmentation fault
ERROR: 0031-250  task 38: Segmentation fault
ERROR: 0031-250  task 22: Segmentation fault
ERROR: 0031-250  task 46: Segmentation fault
ERROR: 0031-250  task 3: Segmentation fault
ERROR: 0031-250  task 45: Segmentation fault
ERROR: 0031-250  task 37: Segmentation fault
ERROR: 0031-250  task 40: Segmentation fault
ERROR: 0031-250  task 8: Segmentation fault
ERROR: 0031-250  task 15: Segmentation fault
ERROR: 0031-250  task 33: Segmentation fault
ERROR: 0031-250  task 39: Segmentation fault
ERROR: 0031-250  task 44: Segmentation fault
ERROR: 0031-250  task 56: Segmentation fault
ERROR: 0031-250  task 43: Segmentation fault
ERROR: 0031-250  task 4: Segmentation fault
ERROR: 0031-250  task 12: Segmentation fault
ERROR: 0031-250  task 29: Segmentation fault
ERROR: 0031-250  task 35: Segmentation fault
ERROR: 0031-250  task 25: Segmentation fault
ERROR: 0031-250  task 6: Segmentation fault
ERROR: 0031-250  task 42: Segmentation fault
ERROR: 0031-250  task 13: Segmentation fault
ERROR: 0031-250  task 1: Segmentation fault
ERROR: 0031-250  task 9: Segmentation fault
ERROR: 0031-250  task 10: Segmentation fault
ERROR: 0031-250  task 2: Segmentation fault
ERROR: 0031-250  task 47: Segmentation fault
ERROR: 0031-250  task 5: Segmentation fault
ERROR: 0031-250  task 7: Segmentation fault
ERROR: 0031-250  task 11: Segmentation fault
ERROR: 0031-250  task 32: Segmentation fault
ERROR: 0031-250  task 34: Segmentation fault
ERROR: 0031-250  task 41: Segmentation fault
ERROR: 0031-250  task 36: Segmentation fault
ERROR: 0031-250  task 55: Terminated
ERROR: 0031-250  task 59: Terminated
ERROR: 0031-250  task 53: Terminated
ERROR: 0031-250  task 57: Terminated
ERROR: 0031-250  task 61: Terminated
ERROR: 0031-250  task 63: Terminated
ERROR: 0031-250  task 0: Terminated
[End]

_________________________________________________________________
Find a local pizza place, music store, museum and more…then map the best 
route!  http://local.live.com




More information about the gromacs.org_gmx-users mailing list