[gmx-users] Segmentation fault on HPC
Tomek Wlodarski
tomek.wlodarski at gmail.com
Sun Jan 4 13:44:42 CET 2015
Hi Mark,
Thanks for replay.
Actually log file ends without any error:
Started mdrun on rank 0 Sun Jan 4 12:30:40 2015
Step Time Lambda
0 0.00000 0.00000
Energies (kJ/mol)
Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
1.00844e+05 2.04719e+05 5.69917e+03 6.63396e+04 -4.67549e+05
LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest.
1.46339e+06 -3.74511e+04 -9.86464e+06 -1.81675e+06 4.91794e+01
Potential Kinetic En. Total Energy Conserved En. Temperature
-1.03454e+07 8.79250e+05 -9.46610e+06 -9.46610e+06 2.00415e+02
Pressure (bar) Constr. rmsd
-5.12367e+03 5.80732e-06
I found something strange - when I changed nstxout value I was able to run
simulation for more than 90k steps without error (I killed simulation
because I was running out of space on my disc ;))
So my simulation is running OK when I have:
dt = 0.002 ; 2 femtosecond time step for integration
nsteps = 50000 ; 100 ps
; OUTPUT CONTROL OPTIONS
nstxout = 1 ; save coordinates every 2 ps
nstvout = 5000 ; save velocities every 2 ps
nstenergy = 5000 ; save energies every 2 ps
nstlog = 5000
I tired different values of nstxout and simulations always crash...
changing dt to 0.001 also does not help
Interestingly when I reduce nstep to 500 my simulation crashes as
well..even though I have nstxout = 1...
Is this indication that my simulation is not stable or something else is
happening?
Thanks a lot.
Best,
tomek
On Sat, Jan 3, 2015 at 2:45 PM, Mark Abraham <mark.j.abraham at gmail.com>
wrote:
> Hi,
>
> What do the ends of the .log files say?
> http://www.gromacs.org/Documentation/Terminology/Blowing_Up is a heavy
> favourite.
>
> Mark
>
> On Sat, Jan 3, 2015 at 1:56 PM, Tomek Wlodarski <tomek.wlodarski at gmail.com
> >
> wrote:
>
> > Hi,
> >
> > I am trying to set up simulation on HPC.
> > Energy minimization in vacuum and in water works great (using 4 nodes
> with
> > 96 cores altogether)
> > But when I am trying to run standard md with:
> >
> > aprun -n 96 mdrun_mpi -v -deffnm md >& out
> >
> > I always end up with this error (I tried it also on different gromacs
> > versions: 4.6.5, 5.0, 5.0.4):
> >
> >
> > starting mdrun 'Protein in water'
> > 500000 steps, 1000.0 ps.
> > step 0
> >
> > NOTE: Turning on dynamic load balancing
> >
> > _pmiu_daemon(SIGCHLD): [NID 00383] [c1-0c2s15n3] [Sat Jan 3 12:26:44
> 2015]
> > PE RANK 72 exit signal Segmentation fault
> > _pmiu_daemon(SIGCHLD): [NID 00380] [c1-0c2s15n0] [Sat Jan 3 12:26:44
> 2015]
> > PE RANK 1 exit signal Segmentation fault
> > _pmiu_daemon(SIGCHLD): [NID 00382] [c1-0c2s15n2] [Sat Jan 3 12:26:44
> 2015]
> > PE RANK 48 exit signal Segmentation fault
> > [NID 00380] 2015-01-03 12:26:44 Apid 12376091: initiated application
> > termination
> > Application 12376091 exit codes: 139
> > Application 12376091 exit signals: Killed
> > Application 12376091 resources: utime ~16s, stime ~8s, Rss ~63064,
> inblocks
> > ~363834, outblocks ~1173496
> >
> >
> > Do you have any suggestion what is wrong (why I can run EM not MD?)? I
> > checked different system and gromacs run ok.
> > Thanks.
> > Best,
> >
> > tomek
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list