[gmx-users] Replica Exchange MD on more than 64 processors

Mon Dec 28 10:00:37 CET 2009

On Mon, 28 Dec 2009, David van der Spoel wrote:

> bharat v. adkar wrote:
>>  On Mon, 28 Dec 2009, Mark Abraham wrote:
>> 
>> >  bharat v. adkar wrote:
>> > >   On Sun, 27 Dec 2009, Mark Abraham wrote:
>> > > 
>> > > >   bharat v. adkar wrote:
>> > > > >    On Sun, 27 Dec 2009, Mark Abraham wrote:
>> > > > > > > >    bharat v. adkar wrote:
>> > > > > > > > >     Dear all,
>> > > > > > >       I am trying to perform replica exchange MD (REMD) on a > 
>> > > >   'protein in
>> > > > > > >     water' system. I am following instructions given on wiki > 
>> > > >   (How-Tos ->
>> > > > > > >     REMD). I have to perform the REMD simulation with 35 
>> > >  different
>> > > > > > >     temperatures. As per advise on wiki, I equilibrated the 
>> > > system > > > >     at
>> > > > > > >     respective temperatures (total of 35 equilibration > > 
>> > >  simulations). > >   After
>> > > > > > >     this I generated chk_0.tpr, chk_1.tpr, ..., chk_34.tpr 
>> > > files > >   from the
>> > > > > > >     equilibrated structures.
>> > > > > > > > >    Now when I submit final job for REMD with following > 
>> > > >   command-line, it > >  gives
>> > > > > > >     some error:
>> > > > > > > > >    command line: mpiexec -np 70 mdrun -multi 35 -replex 
>> > >  1000 -s > >  chk_.tpr > >  -v
>> > > > > > > > >     error msg:
>> > > > > > >     -------------------------------------------------------
>> > > > > > >     Program mdrun_mpi, VERSION 4.0.7
>> > > > > > >     Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 
>> > >  179
>> > > > > > > > >     Fatal error:
>> > > > > > >     Not enough memory. Failed to realloc 790760 bytes for > > 
>> > > > >     nlist->jjnr,
>> > > > > > >     nlist->jjnr=0x9a400030
>> > > > > > >     (called from file ../../../SRC/src/mdlib/ns.c, line 503)
>> > > > > > >     -------------------------------------------------------
>> > > > > > > > >     Thanx for Using GROMACS - Have a Nice Day
>> > > > > > > :     Cannot allocate memory
>> > > > > > >     Error on node 19, will try to stop all the nodes
>> > > > > > >     Halting parallel program mdrun_mpi on CPU 19 out of 70
>> > > > > > > > > 
>> > >  ***********************************************************************
>> > > > > > > > > > >    The individual node on the cluster has 8GB of 
>> > > physical > >   memory and 16GB > >  of
>> > > > > > >     swap memory. Moreover, when logged onto the individual 
>> > >  nodes, > > it > >    shows
>> > > > > > >     more than 1GB of free memory, so there should be no 
>> > > problem > > with > >    cluster
>> > > > > > >     memory. Also, the equilibration jobs for the same system 
>> > > are > >   run on > >   the
>> > > > > > >     same cluster without any problem.
>> > > > > > > > >    What I have observed by submitting different test jobs 
>> > > with > > varying > >    number
>> > > > > > >    of processors (and no. of replicas, wherever necessary), 
>> > > that > >   any job > >  with
>> > > > > > >    total number of processors <= 64, runs faithfully without 
>> > > any > >   problem. > >  As
>> > > > > > >     soon as total number of processors are more than 64, it 
>> > > gives > > the > >     above
>> > > > > > >    error. I have tested this with 65 processors/65 replicas 
>> > > > > > >    also.
>> > > > > > >    This sounds like you might be running on fewer physical 
>> > >  CPUs > >  than you >  have available. If so, running multiple MPI 
>> > >  processes per > >  physical CPU >  can lead to memory shortage 
>> > >  conditions.
>> > > > > > >   I don't understand what you mean. Do you mean, there might 
>> > >  be more > >  than 8
>> > > > >    processes running per node (each node has 8 processors)? But 
>> > > that > >    also
>> > > > >    does not seem to be the case, as SGE (sun grid engine) output 
>> > > shows > >   only
>> > > > >   eight processes per node.
>> > > > >   65 processes can't have 8 processes per node.
>> > >  why can't it have? as i said, there are 8 processors per node. what i 
>> > >  have
>> > >  not mentioned is that how many nodes it is using. The jobs got 
>> > >  distributed
>> > >  over 9 nodes. 8 of which corresponds to 64 processors + 1 processor 
>> > >  from
>> > >   9th node.
>> > 
>> >  OK, that's a full description. Your symptoms are indicative of someone 
>> >  making an error somewhere. Since GROMACS works over more than 64 
>> >  processors elsewhere, the presumption is that you are doing something 
>> >  wrong or the machine is not set up in the way you think it is or should 
>> >  be. To get the most effective help, you need to be sure you're providing 
>> >  full information - else we can't tell which error you're making or 
>> >  (potentially) eliminate you as a source of error.
>> >
>>  Sorry for not being clear in statements.
>> 
>> > >  As far I can tell you, job distribution seems okay to me. It is 1 job 
>> > >  per
>> > >   processor.
>> > 
>> >  Does non-REMD GROMACS run on more than 64 processors? Does your cluster 
>> >  support using more than 8 nodes in a run? Can you run an MPI "Hello 
>> >  world" application that prints the processor and node ID across more 
>> >  than 64 processors?
>>
>>  Yes, the cluster supports runs with more than 8 nodes. I generated a
>>  system with 10 nm water box and submitted on 80 processors. It was running
>>  fine. It printed all 80 NODEIDs. Also showed me when the job will get
>>  over.
>>
>>  bharat
>>
>> 
>> > 
>> >  Mark
>> > 
>> > 
>> > >   bharat
>> > > 
>> > > > >   Mark
>> > > > > > >    I don't know what you mean by "swap memory".
>> > > > > > >    Sorry, I meant cache memory..
>> > > > > > >    bharat
>> > > > > > > > >    Mark
>> > > > > > > >     System: Protein + water + Na ions (total 46878 atoms)
>> > > > > > >     Gromacs version: tested with both v4.0.5 and v4.0.7
>> > > > > > >     compiled with: --enable-float --with-fft=fftw3 --enable-mpi
>> > > > > > >     compiler: gcc_3.4.6 -O3
>> > > > > > >     machine details: uname -mpio: x86_64 x86_64 x86_64 
>> > > > > > >     GNU/Linux
>> > > > > > > > > > >    I tried searching the mailing-list without any 
>> > >  luck. I > >  am not sure, if > >  i
>> > > > > > >     am doing anything wrong in giving commands. Please correct 
>> > > me > >   if it > >   is
>> > > > > > >     wrong.
>> > > > > > > > >     Kindly let me know the solution.
>> > > > > > > > > > >     bharat
>> > > > > > > > > > > > > 
>> > > 
>> >
>> 
> your system is going out of memory. probably too big a system or all replicas 
> are runing on he same node.

from MPI output it doesn't seem that all or more than one replicas are 
running on a single processor. regarding system, it has run successfully 
during equilibration.
i am pasting below the stderr file of one of the jobs with number of 
processors = 66. please check the attached file "ToAttach.txt".

again to remind, the cluster here has 8 processors per compute-node.

bharat

CommandLine:  mpirun -np 66 -mdrun -multi 33 -replex 1000 -s chk_.tpr -cpi
chkpt -cpt 30 -cpo chkpt

Output:

NNODES=66, MYRANK=0, HOSTNAME=compute-0-2.local
NNODES=66, MYRANK=1, HOSTNAME=compute-0-2.local
NNODES=66, MYRANK=4, HOSTNAME=compute-0-2.local
NNODES=66, MYRANK=3, HOSTNAME=compute-0-2.local
NNODES=66, MYRANK=9, HOSTNAME=compute-0-3.local
NNODES=66, MYRANK=2, HOSTNAME=compute-0-2.local
NNODES=66, MYRANK=5, HOSTNAME=compute-0-2.local
NNODES=66, MYRANK=6, HOSTNAME=compute-0-2.local
NNODES=66, MYRANK=13, HOSTNAME=compute-0-3.local
NNODES=66, MYRANK=11, HOSTNAME=compute-0-3.local
NNODES=66, MYRANK=12, HOSTNAME=compute-0-3.local
NNODES=66, MYRANK=14, HOSTNAME=compute-0-3.local
NNODES=66, MYRANK=28, HOSTNAME=compute-0-12.local
NNODES=66, MYRANK=10, HOSTNAME=compute-0-3.local
NNODES=66, MYRANK=20, HOSTNAME=compute-0-6.local
NNODES=66, MYRANK=21, HOSTNAME=compute-0-6.local
NNODES=66, MYRANK=23, HOSTNAME=compute-0-6.local
NNODES=66, MYRANK=25, HOSTNAME=compute-0-12.local
NNODES=66, MYRANK=26, HOSTNAME=compute-0-12.local
NNODES=66, MYRANK=30, HOSTNAME=compute-0-12.local
NNODES=66, MYRANK=29, HOSTNAME=compute-0-12.local
NNODES=66, MYRANK=24, HOSTNAME=compute-0-12.local
NNODES=66, MYRANK=7, HOSTNAME=compute-0-2.local
NNODES=66, MYRANK=8, HOSTNAME=compute-0-3.local
NNODES=66, MYRANK=18, HOSTNAME=compute-0-6.local
NNODES=66, MYRANK=58, HOSTNAME=compute-0-27.local
NNODES=66, MYRANK=19, HOSTNAME=compute-0-6.local
NNODES=66, MYRANK=22, HOSTNAME=compute-0-6.local
NNODES=66, MYRANK=47, HOSTNAME=compute-0-11.local
NNODES=66, MYRANK=62, HOSTNAME=compute-0-27.local
NNODES=66, MYRANK=61, HOSTNAME=compute-0-27.local
NNODES=66, MYRANK=51, HOSTNAME=compute-0-24.local
NNODES=66, MYRANK=42, HOSTNAME=compute-0-11.local
NNODES=66, MYRANK=41, HOSTNAME=compute-0-11.local
NNODES=66, MYRANK=57, HOSTNAME=compute-0-27.local
NNODES=66, MYRANK=17, HOSTNAME=compute-0-6.local
NNODES=66, MYRANK=38, HOSTNAME=compute-0-15.local
NNODES=66, MYRANK=37, HOSTNAME=compute-0-15.local
NNODES=66, MYRANK=39, HOSTNAME=compute-0-15.local
NNODES=66, MYRANK=40, HOSTNAME=compute-0-11.local
NNODES=66, MYRANK=45, HOSTNAME=compute-0-11.local
NNODES=66, MYRANK=46, HOSTNAME=compute-0-11.local
NNODES=66, MYRANK=43, HOSTNAME=compute-0-11.local
NNODES=66, MYRANK=44, HOSTNAME=compute-0-11.local
NNODES=66, MYRANK=49, HOSTNAME=compute-0-24.local
NNODES=66, MYRANK=50, HOSTNAME=compute-0-24.local
NNODES=66, MYRANK=48, HOSTNAME=compute-0-24.local
NNODES=66, MYRANK=53, HOSTNAME=compute-0-24.local
NNODES=66, MYRANK=54, HOSTNAME=compute-0-24.local
NNODES=66, MYRANK=52, HOSTNAME=compute-0-24.local
NNODES=66, MYRANK=27, HOSTNAME=compute-0-12.local
NNODES=66, MYRANK=60, HOSTNAME=compute-0-27.local
NNODES=66, MYRANK=59, HOSTNAME=compute-0-27.local
NNODES=66, MYRANK=16, HOSTNAME=compute-0-6.local
NNODES=66, MYRANK=34, HOSTNAME=compute-0-15.local
NNODES=66, MYRANK=33, HOSTNAME=compute-0-15.local
NNODES=66, MYRANK=36, HOSTNAME=compute-0-15.local
NNODES=66, MYRANK=35, HOSTNAME=compute-0-15.local
NNODES=66, MYRANK=56, HOSTNAME=compute-0-27.local
NNODES=66, MYRANK=15, HOSTNAME=compute-0-3.local
NNODES=66, MYRANK=31, HOSTNAME=compute-0-12.local
NNODES=66, MYRANK=63, HOSTNAME=compute-0-27.local
NNODES=66, MYRANK=64, HOSTNAME=compute-0-25.local
NNODES=66, MYRANK=55, HOSTNAME=compute-0-24.local
NNODES=66, MYRANK=32, HOSTNAME=compute-0-15.local
NNODES=66, MYRANK=65, HOSTNAME=compute-0-25.local
NODEID=0 argc=19
NODEID=1 argc=19
NODEID=2 argc=19
NODEID=3 argc=19
NODEID=5 argc=19
NODEID=4 argc=19
NODEID=6 argc=19
NODEID=13 argc=19
NODEID=9 argc=19
NODEID=12 argc=19
NODEID=11 argc=19
NODEID=7 argc=19
NODEID=16 argc=19
NODEID=10 argc=19
NODEID=15 argc=19
NODEID=8 argc=19
NODEID=14 argc=19
NODEID=20 argc=19
NODEID=19 argc=19
NODEID=28 argc=19
NODEID=25 argc=19
NODEID=26 argc=19
NODEID=18 argc=19
NODEID=17 argc=19
NODEID=22 argc=19
NODEID=21 argc=19
NODEID=24 argc=19
NODEID=23 argc=19
NODEID=30 argc=19
NODEID=29 argc=19
NODEID=34 argc=19
NODEID=33 argc=19
NODEID=27 argc=19
NODEID=57 argc=19
NODEID=58 argc=19
NODEID=51 argc=19
NODEID=52 argc=19
NODEID=41 argc=19
NODEID=42 argc=19
NODEID=39 argc=19
NODEID=40 argc=19
NODEID=37 argc=19
NODEID=38 argc=19
NODEID=36 argc=19
NODEID=35 argc=19
NODEID=61 argc=19
NODEID=62 argc=19
NODEID=49 argc=19
NODEID=48 argc=19
NODEID=47 argc=19
NODEID=56 argc=19
NODEID=45 argc=19
NODEID=46 argc=19
NODEID=44 argc=19
NODEID=43 argc=19
NODEID=54 argc=19
NODEID=53 argc=19
NODEID=55 argc=19
NODEID=59 argc=19
NODEID=60 argc=19
NODEID=31 argc=19
NODEID=64 argc=19
NODEID=63 argc=19
NODEID=50 argc=19
NODEID=32 argc=19
NODEID=65 argc=19
                          :-)  G  R  O  M  A  C  S  (-:

                    Groningen Machine for Chemical Simulation

                             :-)  VERSION 4.0.7  (-:

       Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
        Copyright (c) 1991-2000, University of Groningen, The Netherlands.
              Copyright (c) 2001-2008, The GROMACS development team,
             check out http://www.gromacs.org for more information.

          This program is free software; you can redistribute it and/or
           modify it under the terms of the GNU General Public License
          as published by the Free Software Foundation; either version 2
              of the License, or (at your option) any later version.

         :-)  /groupmisc/bharat/soft/GMX407_bh/INSTL/bin/mdrun_mpi  (-:

Option     Filename  Type         Description
------------------------------------------------------------
   -s       chk_.tpr  Input        Run input file: tpr tpb tpa
   -o       traj.trr  Output       Full precision trajectory: trr trj cpt
   -x       traj.xtc  Output, Opt! Compressed trajectory (portable xdr 
format)
-cpi      chkpt.cpt  Input, Opt!  Checkpoint file
-cpo      chkpt.cpt  Output, Opt! Checkpoint file
   -c    confout.gro  Output       Structure file: gro g96 pdb
   -e       ener.edr  Output       Energy file: edr ene
   -g         md.log  Output       Log file
-dgdl      dgdl.xvg  Output, Opt. xvgr/xmgr file
-field    field.xvg  Output, Opt. xvgr/xmgr file
-table    table.xvg  Input, Opt.  xvgr/xmgr file
-tablep  tablep.xvg  Input, Opt.  xvgr/xmgr file
-tableb   table.xvg  Input, Opt.  xvgr/xmgr file
-rerun    rerun.xtc  Input, Opt.  Trajectory: xtc trr trj gro g96 pdb cpt
-tpi        tpi.xvg  Output, Opt. xvgr/xmgr file
-tpid   tpidist.xvg  Output, Opt. xvgr/xmgr file
  -ei        sam.edi  Input, Opt.  ED sampling input
  -eo        sam.edo  Output, Opt. ED sampling output
   -j       wham.gct  Input, Opt.  General coupling stuff
  -jo        bam.gct  Output, Opt. General coupling stuff
-ffout      gct.xvg  Output, Opt. xvgr/xmgr file
-devout   deviatie.xvg  Output, Opt. xvgr/xmgr file
-runav  runaver.xvg  Output, Opt. xvgr/xmgr file
  -px      pullx.xvg  Output, Opt. xvgr/xmgr file
  -pf      pullf.xvg  Output, Opt. xvgr/xmgr file
-mtx         nm.mtx  Output, Opt. Hessian matrix
  -dn     dipole.ndx  Output, Opt. Index file

Option       Type   Value   Description
------------------------------------------------------
-[no]h       bool   no      Print help info and quit
-nice        int    0       Set the nicelevel
-deffnm      string         Set the default filename for all file options
-[no]xvgr    bool   yes     Add specific codes (legends etc.) in the 
output
                             xvg files for the xmgrace program
-[no]pd      bool   no      Use particle decompostion
-dd          vector 0 0 0   Domain decomposition grid, 0 is optimize
-npme        int    -1      Number of separate nodes to be used for PME, 
-1
                             is guess
-ddorder     enum   interleave  DD node order: interleave, pp_pme or 
cartesian
-[no]ddcheck bool   yes     Check for all bonded interactions with DD
-rdd         real   0       The maximum distance for bonded interactions 
with
                             DD (nm), 0 is determine from initial 
coordinates
-rcon        real   0       Maximum distance for P-LINCS (nm), 0 is 
estimate
-dlb         enum   auto    Dynamic load balancing (with DD): auto, no or 
yes
-dds         real   0.8     Minimum allowed dlb scaling of the DD cell 
size
-[no]sum     bool   yes     Sum the energies at every step
-[no]v       bool   yes     Be loud and noisy
-[no]compact bool   yes     Write a compact log file
-[no]seppot  bool   no      Write separate V and dVdl terms for each
                             interaction type and node to the log file(s)
-pforce      real   -1      Print all forces larger than this (kJ/mol nm)
-[no]reprod  bool   no      Try to avoid optimizations that affect binary
                             reproducibility
-cpt         real   30      Checkpoint interval (minutes)
-[no]append  bool   no      Append to previous output files when 
continuing
                             from checkpoint
-[no]addpart bool   yes     Add the simulation part number to all output
                             files when continuing from checkpoint
-maxh        real   -1      Terminate after 0.99 times this time (hours)
-multi       int    33      Do multiple simulations in parallel
-replex      int    1000    Attempt replica exchange every # steps
-reseed      int    -1      Seed for replica exchange, -1 is generate a 
seed
-[no]glas    bool   no      Do glass simulation with special long range
                             corrections
-[no]ionize  bool   no      Do a simulation including the effect of an 
X-Ray
                             bombardment on your system

Getting Loaded...
Getting Loaded...
Reading file chk_0.tpr, VERSION 4.0.5 (single precision)
Getting Loaded...
Reading file chk_32.tpr, VERSION 4.0.5 (single precision)
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Reading file chk_1.tpr, VERSION 4.0.5 (single precision)
Getting Loaded...
Getting Loaded...
Getting Loaded...
Reading file chk_3.tpr, VERSION 4.0.5 (single precision)
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Reading file chk_13.tpr, VERSION 4.0.5 (single precision)
Getting Loaded...
Getting Loaded...
Reading file chk_15.tpr, VERSION 4.0.5 (single precision)
Reading file chk_12.tpr, VERSION 4.0.5 (single precision)
Reading file chk_25.tpr, VERSION 4.0.5 (single precision)
Reading file chk_30.tpr, VERSION 4.0.5 (single precision)
Reading file chk_26.tpr, VERSION 4.0.5 (single precision)
Reading file chk_24.tpr, VERSION 4.0.5 (single precision)
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Reading file chk_4.tpr, VERSION 4.0.5 (single precision)
Reading file chk_16.tpr, VERSION 4.0.5 (single precision)
Reading file chk_8.tpr, VERSION 4.0.5 (single precision)
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Getting Loaded...
Reading file chk_5.tpr, VERSION 4.0.5 (single precision)
Reading file chk_20.tpr, VERSION 4.0.5 (single precision)
Reading file chk_11.tpr, VERSION 4.0.5 (single precision)
Reading file chk_7.tpr, VERSION 4.0.5 (single precision)
Reading file chk_28.tpr, VERSION 4.0.5 (single precision)
Reading file chk_6.tpr, VERSION 4.0.5 (single precision)
Reading file chk_31.tpr, VERSION 4.0.5 (single precision)
Reading file chk_27.tpr, VERSION 4.0.5 (single precision)
Reading file chk_14.tpr, VERSION 4.0.5 (single precision)
Reading file chk_29.tpr, VERSION 4.0.5 (single precision)
Reading file chk_18.tpr, VERSION 4.0.5 (single precision)
Reading file chk_10.tpr, VERSION 4.0.5 (single precision)
Reading file chk_9.tpr, VERSION 4.0.5 (single precision)
Reading file chk_19.tpr, VERSION 4.0.5 (single precision)
Reading file chk_22.tpr, VERSION 4.0.5 (single precision)
Reading file chk_17.tpr, VERSION 4.0.5 (single precision)
Reading file chk_21.tpr, VERSION 4.0.5 (single precision)
Reading file chk_23.tpr, VERSION 4.0.5 (single precision)
Getting Loaded...
Reading file chk_2.tpr, VERSION 4.0.5 (single precision)
Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Loaded with Money

Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Loaded with Money

Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Loaded with Money

Making 1D domain decomposition 2 x 1 x 1
Loaded with Money

Loaded with Money

Loaded with Money

Making 1D domain decomposition 2 x 1 x 1
Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Loaded with Money

Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Loaded with Money

Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
Making 1D domain decomposition 2 x 1 x 1
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.
starting mdrun 'Protein'
500000 steps,   1000.0 ps.

-------------------------------------------------------
Program mdrun_mpi, VERSION 4.0.7
Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179

Fatal error:
Not enough memory. Failed to realloc 790760 bytes for nlist->jjnr, 
nlist->jjnr=0x98500030
(called from file ../../../SRC/src/mdlib/ns.c, line 503)
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day
: Cannot allocate memory
Error on node 64, will try to stop all the nodes
Halting parallel program mdrun_mpi on CPU 64 out of 66

gcq#0: Thanx for Using GROMACS - Have a Nice Day

application called MPI_Abort(MPI_COMM_WORLD, -1) - process 64
-------------------------------------------------------
Program mdrun_mpi, VERSION 4.0.7
Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179

Fatal error:
Not enough memory. Failed to realloc 790760 bytes for nlist->jjnr, 
nlist->jjnr=0x9a300030
(called from file ../../../SRC/src/mdlib/ns.c, line 503)
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day
: Cannot allocate memory
Error on node 55, will try to stop all the nodes
Halting parallel program mdrun_mpi on CPU 55 out of 66

-------------------------------------------------------
Program mdrun_mpi, VERSION 4.0.7
Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179

Fatal error:
Not enough memory. Failed to realloc 400768 bytes for nlist->jjnr, 
nlist->jjnr=0x9a300030
(called from file ../../../SRC/src/mdlib/ns.c, line 503)
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day
: Cannot allocate memory
Error on node 9, will try to stop all the nodes
Halting parallel program mdrun_mpi on CPU 9 out of 66

gcq#0: Thanx for Using GROMACS - Have a Nice Day

application called MPI_Abort(MPI_COMM_WORLD, -1) - process 9
gcq#0: Thanx for Using GROMACS - Have a Nice Day

application called MPI_Abort(MPI_COMM_WORLD, -1) - process 55

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.