[gmx-users] Replica Exchange MD on more than 64 processors

Mark Abraham Mark.Abraham at anu.edu.au
Mon Dec 28 11:58:36 CET 2009


bharat v. adkar wrote:
> On Mon, 28 Dec 2009, David van der Spoel wrote:
> 
>> bharat v. adkar wrote:
>>>  On Mon, 28 Dec 2009, Mark Abraham wrote:
>>>
>>> >  bharat v. adkar wrote:
>>> > >   On Sun, 27 Dec 2009, Mark Abraham wrote:
>>> > > > > >   bharat v. adkar wrote:
>>> > > > >    On Sun, 27 Dec 2009, Mark Abraham wrote:
>>> > > > > > > >    bharat v. adkar wrote:
>>> > > > > > > > >     Dear all,
>>> > > > > > >       I am trying to perform replica exchange MD (REMD) 
>>> on a > > > >   'protein in
>>> > > > > > >     water' system. I am following instructions given on 
>>> wiki > > > >   (How-Tos ->
>>> > > > > > >     REMD). I have to perform the REMD simulation with 35 
>>> > >  different
>>> > > > > > >     temperatures. As per advise on wiki, I equilibrated 
>>> the > > system > > > >     at
>>> > > > > > >     respective temperatures (total of 35 equilibration > 
>>> > > >  simulations). > >   After
>>> > > > > > >     this I generated chk_0.tpr, chk_1.tpr, ..., 
>>> chk_34.tpr > > files > >   from the
>>> > > > > > >     equilibrated structures.
>>> > > > > > > > >    Now when I submit final job for REMD with 
>>> following > > > >   command-line, it > >  gives
>>> > > > > > >     some error:
>>> > > > > > > > >    command line: mpiexec -np 70 mdrun -multi 35 
>>> -replex > >  1000 -s > >  chk_.tpr > >  -v
>>> > > > > > > > >     error msg:
>>> > > > > > >     -------------------------------------------------------
>>> > > > > > >     Program mdrun_mpi, VERSION 4.0.7
>>> > > > > > >     Source code file: ../../../SRC/src/gmxlib/smalloc.c, 
>>> line: > >  179
>>> > > > > > > > >     Fatal error:
>>> > > > > > >     Not enough memory. Failed to realloc 790760 bytes for 
>>> > > > > > >     nlist->jjnr,
>>> > > > > > >     nlist->jjnr=0x9a400030
>>> > > > > > >     (called from file ../../../SRC/src/mdlib/ns.c, line 503)
>>> > > > > > >     -------------------------------------------------------
>>> > > > > > > > >     Thanx for Using GROMACS - Have a Nice Day
>>> > > > > > > :     Cannot allocate memory
>>> > > > > > >     Error on node 19, will try to stop all the nodes
>>> > > > > > >     Halting parallel program mdrun_mpi on CPU 19 out of 70
>>> > > > > > > > > > >  
>>> ***********************************************************************
>>> > > > > > > > > > >    The individual node on the cluster has 8GB of 
>>> > > physical > >   memory and 16GB > >  of
>>> > > > > > >     swap memory. Moreover, when logged onto the 
>>> individual > >  nodes, > > it > >    shows
>>> > > > > > >     more than 1GB of free memory, so there should be no > 
>>> > problem > > with > >    cluster
>>> > > > > > >     memory. Also, the equilibration jobs for the same 
>>> system > > are > >   run on > >   the
>>> > > > > > >     same cluster without any problem.
>>> > > > > > > > >    What I have observed by submitting different test 
>>> jobs > > with > > varying > >    number
>>> > > > > > >    of processors (and no. of replicas, wherever 
>>> necessary), > > that > >   any job > >  with
>>> > > > > > >    total number of processors <= 64, runs faithfully 
>>> without > > any > >   problem. > >  As
>>> > > > > > >     soon as total number of processors are more than 64, 
>>> it > > gives > > the > >     above
>>> > > > > > >    error. I have tested this with 65 processors/65 
>>> replicas > > > > > >    also.
>>> > > > > > >    This sounds like you might be running on fewer 
>>> physical > >  CPUs > >  than you >  have available. If so, running 
>>> multiple MPI > >  processes per > >  physical CPU >  can lead to 
>>> memory shortage > >  conditions.
>>> > > > > > >   I don't understand what you mean. Do you mean, there 
>>> might > >  be more > >  than 8
>>> > > > >    processes running per node (each node has 8 processors)? 
>>> But > > that > >    also
>>> > > > >    does not seem to be the case, as SGE (sun grid engine) 
>>> output > > shows > >   only
>>> > > > >   eight processes per node.
>>> > > > >   65 processes can't have 8 processes per node.
>>> > >  why can't it have? as i said, there are 8 processors per node. 
>>> what i > >  have
>>> > >  not mentioned is that how many nodes it is using. The jobs got > 
>>> >  distributed
>>> > >  over 9 nodes. 8 of which corresponds to 64 processors + 1 
>>> processor > >  from
>>> > >   9th node.
>>> > >  OK, that's a full description. Your symptoms are indicative of 
>>> someone >  making an error somewhere. Since GROMACS works over more 
>>> than 64 >  processors elsewhere, the presumption is that you are 
>>> doing something >  wrong or the machine is not set up in the way you 
>>> think it is or should >  be. To get the most effective help, you need 
>>> to be sure you're providing >  full information - else we can't tell 
>>> which error you're making or >  (potentially) eliminate you as a 
>>> source of error.
>>> >
>>>  Sorry for not being clear in statements.
>>>
>>> > >  As far I can tell you, job distribution seems okay to me. It is 
>>> 1 job > >  per
>>> > >   processor.
>>> > >  Does non-REMD GROMACS run on more than 64 processors? Does your 
>>> cluster >  support using more than 8 nodes in a run? Can you run an 
>>> MPI "Hello >  world" application that prints the processor and node 
>>> ID across more >  than 64 processors?
>>>
>>>  Yes, the cluster supports runs with more than 8 nodes. I generated a
>>>  system with 10 nm water box and submitted on 80 processors. It was 
>>> running
>>>  fine. It printed all 80 NODEIDs. Also showed me when the job will get
>>>  over.
>>>
>>>  bharat
>>>
>>>
>>> > >  Mark
>>> > > > >   bharat
>>> > > > > > >   Mark
>>> > > > > > >    I don't know what you mean by "swap memory".
>>> > > > > > >    Sorry, I meant cache memory..
>>> > > > > > >    bharat
>>> > > > > > > > >    Mark
>>> > > > > > > >     System: Protein + water + Na ions (total 46878 atoms)
>>> > > > > > >     Gromacs version: tested with both v4.0.5 and v4.0.7
>>> > > > > > >     compiled with: --enable-float --with-fft=fftw3 
>>> --enable-mpi
>>> > > > > > >     compiler: gcc_3.4.6 -O3
>>> > > > > > >     machine details: uname -mpio: x86_64 x86_64 x86_64 > 
>>> > > > > >     GNU/Linux
>>> > > > > > > > > > >    I tried searching the mailing-list without any 
>>> > >  luck. I > >  am not sure, if > >  i
>>> > > > > > >     am doing anything wrong in giving commands. Please 
>>> correct > > me > >   if it > >   is
>>> > > > > > >     wrong.
>>> > > > > > > > >     Kindly let me know the solution.
>>> > > > > > > > > > >     bharat
>>> > > > > > > > > > > > > > > >
>>>
>> your system is going out of memory. probably too big a system or all 
>> replicas are runing on he same node.
> 
> from MPI output it doesn't seem that all or more than one replicas are 
> running on a single processor. 

Indeed.

> regarding system, it has run successfully 
> during equilibration.

How much memory is required for a single replica? If that multiplied by 
8 is an unsuitable number you're in trouble.

Does 64+processor REMD work on a very small simulation system?

Mark

> i am pasting below the stderr file of one of the jobs with number of 
> processors = 66. please check the attached file "ToAttach.txt".
> 
> again to remind, the cluster here has 8 processors per compute-node.
> 
> bharat
> 
> 
> CommandLine:  mpirun -np 66 -mdrun -multi 33 -replex 1000 -s chk_.tpr -cpi
> chkpt -cpt 30 -cpo chkpt
> 
> Output:
> 
> NNODES=66, MYRANK=0, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=1, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=4, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=3, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=9, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=2, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=5, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=6, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=13, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=11, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=12, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=14, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=28, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=10, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=20, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=21, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=23, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=25, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=26, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=30, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=29, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=24, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=7, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=8, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=18, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=58, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=19, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=22, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=47, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=62, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=61, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=51, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=42, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=41, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=57, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=17, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=38, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=37, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=39, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=40, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=45, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=46, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=43, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=44, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=49, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=50, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=48, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=53, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=54, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=52, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=27, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=60, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=59, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=16, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=34, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=33, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=36, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=35, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=56, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=15, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=31, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=63, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=64, HOSTNAME=compute-0-25.local
> NNODES=66, MYRANK=55, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=32, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=65, HOSTNAME=compute-0-25.local
> NODEID=0 argc=19
> NODEID=1 argc=19
> NODEID=2 argc=19
> NODEID=3 argc=19
> NODEID=5 argc=19
> NODEID=4 argc=19
> NODEID=6 argc=19
> NODEID=13 argc=19
> NODEID=9 argc=19
> NODEID=12 argc=19
> NODEID=11 argc=19
> NODEID=7 argc=19
> NODEID=16 argc=19
> NODEID=10 argc=19
> NODEID=15 argc=19
> NODEID=8 argc=19
> NODEID=14 argc=19
> NODEID=20 argc=19
> NODEID=19 argc=19
> NODEID=28 argc=19
> NODEID=25 argc=19
> NODEID=26 argc=19
> NODEID=18 argc=19
> NODEID=17 argc=19
> NODEID=22 argc=19
> NODEID=21 argc=19
> NODEID=24 argc=19
> NODEID=23 argc=19
> NODEID=30 argc=19
> NODEID=29 argc=19
> NODEID=34 argc=19
> NODEID=33 argc=19
> NODEID=27 argc=19
> NODEID=57 argc=19
> NODEID=58 argc=19
> NODEID=51 argc=19
> NODEID=52 argc=19
> NODEID=41 argc=19
> NODEID=42 argc=19
> NODEID=39 argc=19
> NODEID=40 argc=19
> NODEID=37 argc=19
> NODEID=38 argc=19
> NODEID=36 argc=19
> NODEID=35 argc=19
> NODEID=61 argc=19
> NODEID=62 argc=19
> NODEID=49 argc=19
> NODEID=48 argc=19
> NODEID=47 argc=19
> NODEID=56 argc=19
> NODEID=45 argc=19
> NODEID=46 argc=19
> NODEID=44 argc=19
> NODEID=43 argc=19
> NODEID=54 argc=19
> NODEID=53 argc=19
> NODEID=55 argc=19
> NODEID=59 argc=19
> NODEID=60 argc=19
> NODEID=31 argc=19
> NODEID=64 argc=19
> NODEID=63 argc=19
> NODEID=50 argc=19
> NODEID=32 argc=19
> NODEID=65 argc=19
>                          :-)  G  R  O  M  A  C  S  (-:
> 
>                    Groningen Machine for Chemical Simulation
> 
>                             :-)  VERSION 4.0.7  (-:
> 
> 
>       Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
>        Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>              Copyright (c) 2001-2008, The GROMACS development team,
>             check out http://www.gromacs.org for more information.
> 
>          This program is free software; you can redistribute it and/or
>           modify it under the terms of the GNU General Public License
>          as published by the Free Software Foundation; either version 2
>              of the License, or (at your option) any later version.
> 
>         :-)  /groupmisc/bharat/soft/GMX407_bh/INSTL/bin/mdrun_mpi  (-:
> 
> Option     Filename  Type         Description
> ------------------------------------------------------------
>   -s       chk_.tpr  Input        Run input file: tpr tpb tpa
>   -o       traj.trr  Output       Full precision trajectory: trr trj cpt
>   -x       traj.xtc  Output, Opt! Compressed trajectory (portable xdr 
> format)
> -cpi      chkpt.cpt  Input, Opt!  Checkpoint file
> -cpo      chkpt.cpt  Output, Opt! Checkpoint file
>   -c    confout.gro  Output       Structure file: gro g96 pdb
>   -e       ener.edr  Output       Energy file: edr ene
>   -g         md.log  Output       Log file
> -dgdl      dgdl.xvg  Output, Opt. xvgr/xmgr file
> -field    field.xvg  Output, Opt. xvgr/xmgr file
> -table    table.xvg  Input, Opt.  xvgr/xmgr file
> -tablep  tablep.xvg  Input, Opt.  xvgr/xmgr file
> -tableb   table.xvg  Input, Opt.  xvgr/xmgr file
> -rerun    rerun.xtc  Input, Opt.  Trajectory: xtc trr trj gro g96 pdb cpt
> -tpi        tpi.xvg  Output, Opt. xvgr/xmgr file
> -tpid   tpidist.xvg  Output, Opt. xvgr/xmgr file
>  -ei        sam.edi  Input, Opt.  ED sampling input
>  -eo        sam.edo  Output, Opt. ED sampling output
>   -j       wham.gct  Input, Opt.  General coupling stuff
>  -jo        bam.gct  Output, Opt. General coupling stuff
> -ffout      gct.xvg  Output, Opt. xvgr/xmgr file
> -devout   deviatie.xvg  Output, Opt. xvgr/xmgr file
> -runav  runaver.xvg  Output, Opt. xvgr/xmgr file
>  -px      pullx.xvg  Output, Opt. xvgr/xmgr file
>  -pf      pullf.xvg  Output, Opt. xvgr/xmgr file
> -mtx         nm.mtx  Output, Opt. Hessian matrix
>  -dn     dipole.ndx  Output, Opt. Index file
> 
> Option       Type   Value   Description
> ------------------------------------------------------
> -[no]h       bool   no      Print help info and quit
> -nice        int    0       Set the nicelevel
> -deffnm      string         Set the default filename for all file options
> -[no]xvgr    bool   yes     Add specific codes (legends etc.) in the output
>                             xvg files for the xmgrace program
> -[no]pd      bool   no      Use particle decompostion
> -dd          vector 0 0 0   Domain decomposition grid, 0 is optimize
> -npme        int    -1      Number of separate nodes to be used for PME, -1
>                             is guess
> -ddorder     enum   interleave  DD node order: interleave, pp_pme or 
> cartesian
> -[no]ddcheck bool   yes     Check for all bonded interactions with DD
> -rdd         real   0       The maximum distance for bonded interactions 
> with
>                             DD (nm), 0 is determine from initial 
> coordinates
> -rcon        real   0       Maximum distance for P-LINCS (nm), 0 is 
> estimate
> -dlb         enum   auto    Dynamic load balancing (with DD): auto, no 
> or yes
> -dds         real   0.8     Minimum allowed dlb scaling of the DD cell size
> -[no]sum     bool   yes     Sum the energies at every step
> -[no]v       bool   yes     Be loud and noisy
> -[no]compact bool   yes     Write a compact log file
> -[no]seppot  bool   no      Write separate V and dVdl terms for each
>                             interaction type and node to the log file(s)
> -pforce      real   -1      Print all forces larger than this (kJ/mol nm)
> -[no]reprod  bool   no      Try to avoid optimizations that affect binary
>                             reproducibility
> -cpt         real   30      Checkpoint interval (minutes)
> -[no]append  bool   no      Append to previous output files when continuing
>                             from checkpoint
> -[no]addpart bool   yes     Add the simulation part number to all output
>                             files when continuing from checkpoint
> -maxh        real   -1      Terminate after 0.99 times this time (hours)
> -multi       int    33      Do multiple simulations in parallel
> -replex      int    1000    Attempt replica exchange every # steps
> -reseed      int    -1      Seed for replica exchange, -1 is generate a 
> seed
> -[no]glas    bool   no      Do glass simulation with special long range
>                             corrections
> -[no]ionize  bool   no      Do a simulation including the effect of an 
> X-Ray
>                             bombardment on your system
> 
> Getting Loaded...
> Getting Loaded...
> Reading file chk_0.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Reading file chk_32.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_1.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_3.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_13.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Reading file chk_15.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_12.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_25.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_30.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_26.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_24.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_4.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_16.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_8.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_5.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_20.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_11.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_7.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_28.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_6.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_31.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_27.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_14.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_29.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_18.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_10.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_9.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_19.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_22.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_17.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_21.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_23.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Reading file chk_2.tpr, VERSION 4.0.5 (single precision)
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
> 
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
> 
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
> 
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Loaded with Money
> 
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
> 
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps,   1000.0 ps.
> 
> -------------------------------------------------------
> Program mdrun_mpi, VERSION 4.0.7
> Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179
> 
> Fatal error:
> Not enough memory. Failed to realloc 790760 bytes for nlist->jjnr, 
> nlist->jjnr=0x98500030
> (called from file ../../../SRC/src/mdlib/ns.c, line 503)
> -------------------------------------------------------
> 
> Thanx for Using GROMACS - Have a Nice Day
> : Cannot allocate memory
> Error on node 64, will try to stop all the nodes
> Halting parallel program mdrun_mpi on CPU 64 out of 66
> 
> gcq#0: Thanx for Using GROMACS - Have a Nice Day
> 
> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 64
> -------------------------------------------------------
> Program mdrun_mpi, VERSION 4.0.7
> Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179
> 
> Fatal error:
> Not enough memory. Failed to realloc 790760 bytes for nlist->jjnr, 
> nlist->jjnr=0x9a300030
> (called from file ../../../SRC/src/mdlib/ns.c, line 503)
> -------------------------------------------------------
> 
> Thanx for Using GROMACS - Have a Nice Day
> : Cannot allocate memory
> Error on node 55, will try to stop all the nodes
> Halting parallel program mdrun_mpi on CPU 55 out of 66
> 
> -------------------------------------------------------
> Program mdrun_mpi, VERSION 4.0.7
> Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179
> 
> Fatal error:
> Not enough memory. Failed to realloc 400768 bytes for nlist->jjnr, 
> nlist->jjnr=0x9a300030
> (called from file ../../../SRC/src/mdlib/ns.c, line 503)
> -------------------------------------------------------
> 
> Thanx for Using GROMACS - Have a Nice Day
> : Cannot allocate memory
> Error on node 9, will try to stop all the nodes
> Halting parallel program mdrun_mpi on CPU 9 out of 66
> 
> gcq#0: Thanx for Using GROMACS - Have a Nice Day
> 
> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 9
> gcq#0: Thanx for Using GROMACS - Have a Nice Day
> 
> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 55
> 
> 
> 



More information about the gromacs.org_gmx-users mailing list