[gmx-users] GROMACS Parallel Runs
Sunny
ge_sunny at hotmail.com
Mon Oct 2 10:37:36 CEST 2006
>From: David van der Spoel <spoel at xray.bmc.uu.se>
>Reply-To: Discussion list for GROMACS users <gmx-users at gromacs.org>
>To: Discussion list for GROMACS users <gmx-users at gromacs.org>
>Subject: Re: [gmx-users] GROMACS Parallel Runs
>Date: Sun, 01 Oct 2006 19:58:48 +0200
>
>Sunny wrote:
>>Hi,
>>
>>I am using GROMACS 3.3.1 parallel runs on an AIX supercomputing system. My
>>simulation can successfully run on 16 and 32 CPUs (as well as below 16
>>CPUs). When running on 64 CPUs, however, segmentation fault occurs in
>>multiple tasks from very beginning of the simulation. I'd like know what
>>causes the failure and whether there is any solution to fix the failure.
>>
>
>please supply more details, like system size, PME details etc.
>
>
>>Thanks,
>>
>>Sunny
>David.
>
Hi all,
Thanks for your replies. The followings are the full configuration info of
my simulatione found in md0.log and the error message given in the .err. I'm
sorry for the tedious list.
Many thanks,
Sunny
CONFIGURATION INFO:
CPU= 0, lastcg= 298, targetcg= 7732, myshift= 23
CPU= 1, lastcg= 633, targetcg= 8066, myshift= 23
CPU= 2, lastcg= 970, targetcg= 8404, myshift= 23
CPU= 3, lastcg= 1298, targetcg= 8732, myshift= 23
CPU= 4, lastcg= 1629, targetcg= 9062, myshift= 24
CPU= 5, lastcg= 1959, targetcg= 9392, myshift= 25
CPU= 6, lastcg= 2296, targetcg= 9730, myshift= 26
CPU= 7, lastcg= 2624, targetcg=10058, myshift= 27
CPU= 8, lastcg= 2955, targetcg=10388, myshift= 28
CPU= 9, lastcg= 3285, targetcg=10718, myshift= 29
CPU= 10, lastcg= 3622, targetcg=11056, myshift= 30
CPU= 11, lastcg= 3950, targetcg=11384, myshift= 31
CPU= 12, lastcg= 4281, targetcg=11714, myshift= 32
CPU= 13, lastcg= 4611, targetcg=12044, myshift= 33
CPU= 14, lastcg= 4948, targetcg=12382, myshift= 34
CPU= 15, lastcg= 5276, targetcg=12710, myshift= 35
CPU= 16, lastcg= 5607, targetcg=13040, myshift= 36
CPU= 17, lastcg= 5937, targetcg=13370, myshift= 37
CPU= 18, lastcg= 6274, targetcg=13708, myshift= 38
CPU= 19, lastcg= 6602, targetcg=14036, myshift= 39
CPU= 20, lastcg= 6933, targetcg=14366, myshift= 40
CPU= 21, lastcg= 7263, targetcg=14696, myshift= 41
CPU= 22, lastcg= 7600, targetcg= 168, myshift= 42
CPU= 23, lastcg= 7928, targetcg= 496, myshift= 42
CPU= 24, lastcg= 8259, targetcg= 826, myshift= 42
CPU= 25, lastcg= 8589, targetcg= 1156, myshift= 42
CPU= 26, lastcg= 8840, targetcg= 1408, myshift= 42
CPU= 27, lastcg= 9003, targetcg= 1570, myshift= 41
CPU= 28, lastcg= 9166, targetcg= 1734, myshift= 41
CPU= 29, lastcg= 9329, targetcg= 1896, myshift= 40
CPU= 30, lastcg= 9492, targetcg= 2060, myshift= 40
CPU= 31, lastcg= 9655, targetcg= 2222, myshift= 39
CPU= 32, lastcg= 9818, targetcg= 2386, myshift= 39
CPU= 33, lastcg= 9981, targetcg= 2548, myshift= 38
CPU= 34, lastcg=10144, targetcg= 2712, myshift= 38
CPU= 35, lastcg=10307, targetcg= 2874, myshift= 37
CPU= 36, lastcg=10470, targetcg= 3038, myshift= 37
CPU= 37, lastcg=10633, targetcg= 3200, myshift= 36
CPU= 38, lastcg=10796, targetcg= 3364, myshift= 36
CPU= 39, lastcg=10959, targetcg= 3526, myshift= 35
CPU= 40, lastcg=11122, targetcg= 3690, myshift= 35
CPU= 41, lastcg=11285, targetcg= 3852, myshift= 34
CPU= 42, lastcg=11448, targetcg= 4016, myshift= 34
CPU= 43, lastcg=11611, targetcg= 4178, myshift= 33
CPU= 44, lastcg=11774, targetcg= 4342, myshift= 33
CPU= 45, lastcg=11937, targetcg= 4504, myshift= 32
CPU= 46, lastcg=12100, targetcg= 4668, myshift= 32
CPU= 47, lastcg=12263, targetcg= 4830, myshift= 31
CPU= 48, lastcg=12426, targetcg= 4994, myshift= 31
CPU= 49, lastcg=12589, targetcg= 5156, myshift= 30
CPU= 50, lastcg=12752, targetcg= 5320, myshift= 30
CPU= 51, lastcg=12915, targetcg= 5482, myshift= 29
CPU= 52, lastcg=13078, targetcg= 5646, myshift= 29
CPU= 53, lastcg=13240, targetcg= 5808, myshift= 28
CPU= 54, lastcg=13403, targetcg= 5970, myshift= 28
CPU= 55, lastcg=13565, targetcg= 6132, myshift= 27
CPU= 56, lastcg=13728, targetcg= 6296, myshift= 27
CPU= 57, lastcg=13890, targetcg= 6458, myshift= 26
CPU= 58, lastcg=14053, targetcg= 6620, myshift= 26
CPU= 59, lastcg=14215, targetcg= 6782, myshift= 25
CPU= 60, lastcg=14378, targetcg= 6946, myshift= 25
CPU= 61, lastcg=14540, targetcg= 7108, myshift= 24
CPU= 62, lastcg=14703, targetcg= 7270, myshift= 24
CPU= 63, lastcg=14865, targetcg= 7432, myshift= 23
nsb->shift = 42, nsb->bshift= 0
Listing Scalars
nsb->nodeid: 0
nsb->nnodes: 64
nsb->cgtotal: 14866
nsb->natoms: 31242
nsb->shift: 42
nsb->bshift: 0
Nodeid index homenr cgload workload
0 0 488 299 299
1 488 491 634 634
2 979 488 971 971
3 1467 488 1299 1299
4 1955 488 1630 1630
5 2443 486 1960 1960
6 2929 488 2297 2297
7 3417 488 2625 2625
8 3905 488 2956 2956
9 4393 486 3286 3286
10 4879 488 3623 3623
11 5367 488 3951 3951
12 5855 488 4282 4282
13 6343 486 4612 4612
14 6829 488 4949 4949
15 7317 488 5277 5277
16 7805 488 5608 5608
17 8293 486 5938 5938
18 8779 488 6275 6275
19 9267 488 6603 6603
20 9755 488 6934 6934
21 10243 486 7264 7264
22 10729 488 7601 7601
23 11217 488 7929 7929
24 11705 488 8260 8260
25 12193 486 8590 8590
26 12679 488 8841 8841
27 13167 489 9004 9004
28 13656 489 9167 9167
29 14145 489 9330 9330
30 14634 489 9493 9493
31 15123 489 9656 9656
32 15612 489 9819 9819
33 16101 489 9982 9982
34 16590 489 10145 10145
35 17079 489 10308 10308
36 17568 489 10471 10471
37 18057 489 10634 10634
38 18546 489 10797 10797
39 19035 489 10960 10960
40 19524 489 11123 11123
41 20013 489 11286 11286
42 20502 489 11449 11449
43 20991 489 11612 11612
44 21480 489 11775 11775
45 21969 489 11938 11938
46 22458 489 12101 12101
47 22947 489 12264 12264
48 23436 489 12427 12427
49 23925 489 12590 12590
50 24414 489 12753 12753
51 24903 489 12916 12916
52 25392 489 13079 13079
53 25881 486 13241 13241
54 26367 489 13404 13404
55 26856 486 13566 13566
56 27342 489 13729 13729
57 27831 486 13891 13891
58 28317 489 14054 14054
59 28806 486 14216 14216
60 29292 489 14379 14379
61 29781 486 14541 14541
62 30267 489 14704 14704
63 30756 486 14866 14866
parameters of the run:
integrator = md
nsteps = 100000
init_step = 0
ns_type = Grid
nstlist = 10
ndelta = 2
bDomDecomp = FALSE
decomp_dir = 0
nstcomm = 1
comm_mode = Linear
nstcheckpoint = 1000
nstlog = 100
nstxout = 1000
nstvout = 25000
nstfout = 0
nstenergy = 100
nstxtcout = 500
init_t = 0
delta_t = 0.002
xtcprec = 1000
nkx = 64
nky = 128
nkz = 64
pme_order = 4
ewald_rtol = 1e-05
ewald_geometry = 0
epsilon_surface = 0
optimize_fft = TRUE
ePBC = xyz
bUncStart = FALSE
bShakeSOR = FALSE
etc = Nose-Hoover
epc = Parrinello-Rahman
epctype = Isotropic
tau_p = 5
ref_p (3x3):
ref_p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
ref_p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
ref_p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
compress (3x3):
compress[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
compress[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
compress[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
andersen_seed = 815131
rlist = 1
coulombtype = PME
rcoulomb_switch = 0
rcoulomb = 1
vdwtype = Cut-off
rvdw_switch = 0
rvdw = 1
epsilon_r = 1
epsilon_rf = 1
tabext = 1
gb_algorithm = Still
nstgbradii = 1
rgbradii = 2
gb_saltconc = 0
implicit_solvent = No
DispCorr = No
fudgeQQ = 1
free_energy = no
init_lambda = 0
sc_alpha = 0
sc_power = 0
sc_sigma = 0.3
delta_lambda = 0
disre_weighting = Conservative
disre_mixed = FALSE
dr_fc = 1000
dr_tau = 0
nstdisreout = 100
orires_fc = 0
orires_tau = 0
nstorireout = 100
dihre-fc = 1000
dihre-tau = 0
nstdihreout = 100
em_stepsize = 0.001
em_tol = 1e-06
niter = 1000
fc_stepsize = 0
nstcgsteep = 10000
nbfgscorr = 10
ConstAlg = Lincs
shake_tol = 0.0001
lincs_order = 4
lincs_warnangle = 30
lincs_iter = 1
bd_fric = 0
ld_seed = 1993
cos_accel = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 75399
ref_t: 300
tau_t: 0.5
anneal: No
ann_npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp_flags[ 0]: 0 0 0
energygrp_flags[ 1]: 0 0 0
energygrp_flags[ 2]: 0 0 0
efield-x:
n = 0
efield-xt:
n = 0
efield-y:
n = 0
efield-yt:
n = 0
efield-z:
n = 0
efield-zt:
n = 0
bQMMM = FALSE
QMconstraints = 0
QMMMscheme = 0
scalefactor = 1
qm_opts:
ngQM = 0
Max number of graph edges per atom is 4
Table routines are used for coulomb: TRUE
Table routines are used for vdw: FALSE
Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Cut-off's: NS: 1 Coulomb: 1 LJ: 1
System total charge: 0.000
Generated table with 1000 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1000 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1000 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Enabling SPC water optimization for 6108 molecules.
Will do PME sum in reciprocal space.
[End]
--------------------------------------------------------------------------
ERROR MESSAGE:
Reading file topol.tpr, VERSION 3.3.1 (single precision)
Back Off! I just backed up ener.edr to ./#ener.edr.1#
starting mdrun 'sivdppc'
100000 steps, 200.0 ps.
Back Off! I just backed up traj.trr to ./#traj.trr.1#
Back Off! I just backed up traj.xtc to ./#traj.xtc.1#
Back Off! I just backed up step-1.pdb to ./#step-1.pdb.1#
ERROR: 0031-250 task 62: Segmentation fault
ERROR: 0031-250 task 54: Segmentation fault
ERROR: 0031-250 task 58: Segmentation fault
ERROR: 0031-250 task 50: Segmentation fault
ERROR: 0031-250 task 51: Segmentation fault
Back Off! I just backed up step0.pdb to ./#step0.pdb.1#
ERROR: 0031-250 task 19: Segmentation fault
ERROR: 0031-250 task 28: Segmentation fault
ERROR: 0031-250 task 49: Segmentation fault
ERROR: 0031-250 task 17: Segmentation fault
ERROR: 0031-250 task 20: Segmentation fault
ERROR: 0031-250 task 23: Segmentation fault
ERROR: 0031-250 task 26: Segmentation fault
ERROR: 0031-250 task 27: Segmentation fault
ERROR: 0031-250 task 31: Segmentation fault
Wrote pdb files with previous and current coordinates
ERROR: 0031-250 task 52: Segmentation fault
ERROR: 0031-250 task 18: Segmentation fault
ERROR: 0031-250 task 60: Segmentation fault
ERROR: 0031-250 task 24: Segmentation fault
ERROR: 0031-250 task 16: Segmentation fault
ERROR: 0031-250 task 30: Segmentation fault
ERROR: 0031-250 task 21: Segmentation fault
ERROR: 0031-250 task 14: Segmentation fault
ERROR: 0031-250 task 48: Segmentation fault
ERROR: 0031-250 task 38: Segmentation fault
ERROR: 0031-250 task 22: Segmentation fault
ERROR: 0031-250 task 46: Segmentation fault
ERROR: 0031-250 task 3: Segmentation fault
ERROR: 0031-250 task 45: Segmentation fault
ERROR: 0031-250 task 37: Segmentation fault
ERROR: 0031-250 task 40: Segmentation fault
ERROR: 0031-250 task 8: Segmentation fault
ERROR: 0031-250 task 15: Segmentation fault
ERROR: 0031-250 task 33: Segmentation fault
ERROR: 0031-250 task 39: Segmentation fault
ERROR: 0031-250 task 44: Segmentation fault
ERROR: 0031-250 task 56: Segmentation fault
ERROR: 0031-250 task 43: Segmentation fault
ERROR: 0031-250 task 4: Segmentation fault
ERROR: 0031-250 task 12: Segmentation fault
ERROR: 0031-250 task 29: Segmentation fault
ERROR: 0031-250 task 35: Segmentation fault
ERROR: 0031-250 task 25: Segmentation fault
ERROR: 0031-250 task 6: Segmentation fault
ERROR: 0031-250 task 42: Segmentation fault
ERROR: 0031-250 task 13: Segmentation fault
ERROR: 0031-250 task 1: Segmentation fault
ERROR: 0031-250 task 9: Segmentation fault
ERROR: 0031-250 task 10: Segmentation fault
ERROR: 0031-250 task 2: Segmentation fault
ERROR: 0031-250 task 47: Segmentation fault
ERROR: 0031-250 task 5: Segmentation fault
ERROR: 0031-250 task 7: Segmentation fault
ERROR: 0031-250 task 11: Segmentation fault
ERROR: 0031-250 task 32: Segmentation fault
ERROR: 0031-250 task 34: Segmentation fault
ERROR: 0031-250 task 41: Segmentation fault
ERROR: 0031-250 task 36: Segmentation fault
ERROR: 0031-250 task 55: Terminated
ERROR: 0031-250 task 59: Terminated
ERROR: 0031-250 task 53: Terminated
ERROR: 0031-250 task 57: Terminated
ERROR: 0031-250 task 61: Terminated
ERROR: 0031-250 task 63: Terminated
ERROR: 0031-250 task 0: Terminated
[End]
_________________________________________________________________
Find a local pizza place, music store, museum and more
then map the best
route! http://local.live.com
More information about the gromacs.org_gmx-users
mailing list