[gmx-users] Replica Exchange MD on more than 64 processors
Mark Abraham
Mark.Abraham at anu.edu.au
Mon Dec 28 11:58:36 CET 2009
bharat v. adkar wrote:
> On Mon, 28 Dec 2009, David van der Spoel wrote:
>
>> bharat v. adkar wrote:
>>> On Mon, 28 Dec 2009, Mark Abraham wrote:
>>>
>>> > bharat v. adkar wrote:
>>> > > On Sun, 27 Dec 2009, Mark Abraham wrote:
>>> > > > > > bharat v. adkar wrote:
>>> > > > > On Sun, 27 Dec 2009, Mark Abraham wrote:
>>> > > > > > > > bharat v. adkar wrote:
>>> > > > > > > > > Dear all,
>>> > > > > > > I am trying to perform replica exchange MD (REMD)
>>> on a > > > > 'protein in
>>> > > > > > > water' system. I am following instructions given on
>>> wiki > > > > (How-Tos ->
>>> > > > > > > REMD). I have to perform the REMD simulation with 35
>>> > > different
>>> > > > > > > temperatures. As per advise on wiki, I equilibrated
>>> the > > system > > > > at
>>> > > > > > > respective temperatures (total of 35 equilibration >
>>> > > > simulations). > > After
>>> > > > > > > this I generated chk_0.tpr, chk_1.tpr, ...,
>>> chk_34.tpr > > files > > from the
>>> > > > > > > equilibrated structures.
>>> > > > > > > > > Now when I submit final job for REMD with
>>> following > > > > command-line, it > > gives
>>> > > > > > > some error:
>>> > > > > > > > > command line: mpiexec -np 70 mdrun -multi 35
>>> -replex > > 1000 -s > > chk_.tpr > > -v
>>> > > > > > > > > error msg:
>>> > > > > > > -------------------------------------------------------
>>> > > > > > > Program mdrun_mpi, VERSION 4.0.7
>>> > > > > > > Source code file: ../../../SRC/src/gmxlib/smalloc.c,
>>> line: > > 179
>>> > > > > > > > > Fatal error:
>>> > > > > > > Not enough memory. Failed to realloc 790760 bytes for
>>> > > > > > > nlist->jjnr,
>>> > > > > > > nlist->jjnr=0x9a400030
>>> > > > > > > (called from file ../../../SRC/src/mdlib/ns.c, line 503)
>>> > > > > > > -------------------------------------------------------
>>> > > > > > > > > Thanx for Using GROMACS - Have a Nice Day
>>> > > > > > > : Cannot allocate memory
>>> > > > > > > Error on node 19, will try to stop all the nodes
>>> > > > > > > Halting parallel program mdrun_mpi on CPU 19 out of 70
>>> > > > > > > > > > >
>>> ***********************************************************************
>>> > > > > > > > > > > The individual node on the cluster has 8GB of
>>> > > physical > > memory and 16GB > > of
>>> > > > > > > swap memory. Moreover, when logged onto the
>>> individual > > nodes, > > it > > shows
>>> > > > > > > more than 1GB of free memory, so there should be no >
>>> > problem > > with > > cluster
>>> > > > > > > memory. Also, the equilibration jobs for the same
>>> system > > are > > run on > > the
>>> > > > > > > same cluster without any problem.
>>> > > > > > > > > What I have observed by submitting different test
>>> jobs > > with > > varying > > number
>>> > > > > > > of processors (and no. of replicas, wherever
>>> necessary), > > that > > any job > > with
>>> > > > > > > total number of processors <= 64, runs faithfully
>>> without > > any > > problem. > > As
>>> > > > > > > soon as total number of processors are more than 64,
>>> it > > gives > > the > > above
>>> > > > > > > error. I have tested this with 65 processors/65
>>> replicas > > > > > > also.
>>> > > > > > > This sounds like you might be running on fewer
>>> physical > > CPUs > > than you > have available. If so, running
>>> multiple MPI > > processes per > > physical CPU > can lead to
>>> memory shortage > > conditions.
>>> > > > > > > I don't understand what you mean. Do you mean, there
>>> might > > be more > > than 8
>>> > > > > processes running per node (each node has 8 processors)?
>>> But > > that > > also
>>> > > > > does not seem to be the case, as SGE (sun grid engine)
>>> output > > shows > > only
>>> > > > > eight processes per node.
>>> > > > > 65 processes can't have 8 processes per node.
>>> > > why can't it have? as i said, there are 8 processors per node.
>>> what i > > have
>>> > > not mentioned is that how many nodes it is using. The jobs got >
>>> > distributed
>>> > > over 9 nodes. 8 of which corresponds to 64 processors + 1
>>> processor > > from
>>> > > 9th node.
>>> > > OK, that's a full description. Your symptoms are indicative of
>>> someone > making an error somewhere. Since GROMACS works over more
>>> than 64 > processors elsewhere, the presumption is that you are
>>> doing something > wrong or the machine is not set up in the way you
>>> think it is or should > be. To get the most effective help, you need
>>> to be sure you're providing > full information - else we can't tell
>>> which error you're making or > (potentially) eliminate you as a
>>> source of error.
>>> >
>>> Sorry for not being clear in statements.
>>>
>>> > > As far I can tell you, job distribution seems okay to me. It is
>>> 1 job > > per
>>> > > processor.
>>> > > Does non-REMD GROMACS run on more than 64 processors? Does your
>>> cluster > support using more than 8 nodes in a run? Can you run an
>>> MPI "Hello > world" application that prints the processor and node
>>> ID across more > than 64 processors?
>>>
>>> Yes, the cluster supports runs with more than 8 nodes. I generated a
>>> system with 10 nm water box and submitted on 80 processors. It was
>>> running
>>> fine. It printed all 80 NODEIDs. Also showed me when the job will get
>>> over.
>>>
>>> bharat
>>>
>>>
>>> > > Mark
>>> > > > > bharat
>>> > > > > > > Mark
>>> > > > > > > I don't know what you mean by "swap memory".
>>> > > > > > > Sorry, I meant cache memory..
>>> > > > > > > bharat
>>> > > > > > > > > Mark
>>> > > > > > > > System: Protein + water + Na ions (total 46878 atoms)
>>> > > > > > > Gromacs version: tested with both v4.0.5 and v4.0.7
>>> > > > > > > compiled with: --enable-float --with-fft=fftw3
>>> --enable-mpi
>>> > > > > > > compiler: gcc_3.4.6 -O3
>>> > > > > > > machine details: uname -mpio: x86_64 x86_64 x86_64 >
>>> > > > > > GNU/Linux
>>> > > > > > > > > > > I tried searching the mailing-list without any
>>> > > luck. I > > am not sure, if > > i
>>> > > > > > > am doing anything wrong in giving commands. Please
>>> correct > > me > > if it > > is
>>> > > > > > > wrong.
>>> > > > > > > > > Kindly let me know the solution.
>>> > > > > > > > > > > bharat
>>> > > > > > > > > > > > > > > >
>>>
>> your system is going out of memory. probably too big a system or all
>> replicas are runing on he same node.
>
> from MPI output it doesn't seem that all or more than one replicas are
> running on a single processor.
Indeed.
> regarding system, it has run successfully
> during equilibration.
How much memory is required for a single replica? If that multiplied by
8 is an unsuitable number you're in trouble.
Does 64+processor REMD work on a very small simulation system?
Mark
> i am pasting below the stderr file of one of the jobs with number of
> processors = 66. please check the attached file "ToAttach.txt".
>
> again to remind, the cluster here has 8 processors per compute-node.
>
> bharat
>
>
> CommandLine: mpirun -np 66 -mdrun -multi 33 -replex 1000 -s chk_.tpr -cpi
> chkpt -cpt 30 -cpo chkpt
>
> Output:
>
> NNODES=66, MYRANK=0, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=1, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=4, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=3, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=9, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=2, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=5, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=6, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=13, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=11, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=12, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=14, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=28, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=10, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=20, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=21, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=23, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=25, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=26, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=30, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=29, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=24, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=7, HOSTNAME=compute-0-2.local
> NNODES=66, MYRANK=8, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=18, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=58, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=19, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=22, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=47, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=62, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=61, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=51, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=42, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=41, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=57, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=17, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=38, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=37, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=39, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=40, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=45, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=46, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=43, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=44, HOSTNAME=compute-0-11.local
> NNODES=66, MYRANK=49, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=50, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=48, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=53, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=54, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=52, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=27, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=60, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=59, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=16, HOSTNAME=compute-0-6.local
> NNODES=66, MYRANK=34, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=33, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=36, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=35, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=56, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=15, HOSTNAME=compute-0-3.local
> NNODES=66, MYRANK=31, HOSTNAME=compute-0-12.local
> NNODES=66, MYRANK=63, HOSTNAME=compute-0-27.local
> NNODES=66, MYRANK=64, HOSTNAME=compute-0-25.local
> NNODES=66, MYRANK=55, HOSTNAME=compute-0-24.local
> NNODES=66, MYRANK=32, HOSTNAME=compute-0-15.local
> NNODES=66, MYRANK=65, HOSTNAME=compute-0-25.local
> NODEID=0 argc=19
> NODEID=1 argc=19
> NODEID=2 argc=19
> NODEID=3 argc=19
> NODEID=5 argc=19
> NODEID=4 argc=19
> NODEID=6 argc=19
> NODEID=13 argc=19
> NODEID=9 argc=19
> NODEID=12 argc=19
> NODEID=11 argc=19
> NODEID=7 argc=19
> NODEID=16 argc=19
> NODEID=10 argc=19
> NODEID=15 argc=19
> NODEID=8 argc=19
> NODEID=14 argc=19
> NODEID=20 argc=19
> NODEID=19 argc=19
> NODEID=28 argc=19
> NODEID=25 argc=19
> NODEID=26 argc=19
> NODEID=18 argc=19
> NODEID=17 argc=19
> NODEID=22 argc=19
> NODEID=21 argc=19
> NODEID=24 argc=19
> NODEID=23 argc=19
> NODEID=30 argc=19
> NODEID=29 argc=19
> NODEID=34 argc=19
> NODEID=33 argc=19
> NODEID=27 argc=19
> NODEID=57 argc=19
> NODEID=58 argc=19
> NODEID=51 argc=19
> NODEID=52 argc=19
> NODEID=41 argc=19
> NODEID=42 argc=19
> NODEID=39 argc=19
> NODEID=40 argc=19
> NODEID=37 argc=19
> NODEID=38 argc=19
> NODEID=36 argc=19
> NODEID=35 argc=19
> NODEID=61 argc=19
> NODEID=62 argc=19
> NODEID=49 argc=19
> NODEID=48 argc=19
> NODEID=47 argc=19
> NODEID=56 argc=19
> NODEID=45 argc=19
> NODEID=46 argc=19
> NODEID=44 argc=19
> NODEID=43 argc=19
> NODEID=54 argc=19
> NODEID=53 argc=19
> NODEID=55 argc=19
> NODEID=59 argc=19
> NODEID=60 argc=19
> NODEID=31 argc=19
> NODEID=64 argc=19
> NODEID=63 argc=19
> NODEID=50 argc=19
> NODEID=32 argc=19
> NODEID=65 argc=19
> :-) G R O M A C S (-:
>
> Groningen Machine for Chemical Simulation
>
> :-) VERSION 4.0.7 (-:
>
>
> Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
> Copyright (c) 2001-2008, The GROMACS development team,
> check out http://www.gromacs.org for more information.
>
> This program is free software; you can redistribute it and/or
> modify it under the terms of the GNU General Public License
> as published by the Free Software Foundation; either version 2
> of the License, or (at your option) any later version.
>
> :-) /groupmisc/bharat/soft/GMX407_bh/INSTL/bin/mdrun_mpi (-:
>
> Option Filename Type Description
> ------------------------------------------------------------
> -s chk_.tpr Input Run input file: tpr tpb tpa
> -o traj.trr Output Full precision trajectory: trr trj cpt
> -x traj.xtc Output, Opt! Compressed trajectory (portable xdr
> format)
> -cpi chkpt.cpt Input, Opt! Checkpoint file
> -cpo chkpt.cpt Output, Opt! Checkpoint file
> -c confout.gro Output Structure file: gro g96 pdb
> -e ener.edr Output Energy file: edr ene
> -g md.log Output Log file
> -dgdl dgdl.xvg Output, Opt. xvgr/xmgr file
> -field field.xvg Output, Opt. xvgr/xmgr file
> -table table.xvg Input, Opt. xvgr/xmgr file
> -tablep tablep.xvg Input, Opt. xvgr/xmgr file
> -tableb table.xvg Input, Opt. xvgr/xmgr file
> -rerun rerun.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
> -tpi tpi.xvg Output, Opt. xvgr/xmgr file
> -tpid tpidist.xvg Output, Opt. xvgr/xmgr file
> -ei sam.edi Input, Opt. ED sampling input
> -eo sam.edo Output, Opt. ED sampling output
> -j wham.gct Input, Opt. General coupling stuff
> -jo bam.gct Output, Opt. General coupling stuff
> -ffout gct.xvg Output, Opt. xvgr/xmgr file
> -devout deviatie.xvg Output, Opt. xvgr/xmgr file
> -runav runaver.xvg Output, Opt. xvgr/xmgr file
> -px pullx.xvg Output, Opt. xvgr/xmgr file
> -pf pullf.xvg Output, Opt. xvgr/xmgr file
> -mtx nm.mtx Output, Opt. Hessian matrix
> -dn dipole.ndx Output, Opt. Index file
>
> Option Type Value Description
> ------------------------------------------------------
> -[no]h bool no Print help info and quit
> -nice int 0 Set the nicelevel
> -deffnm string Set the default filename for all file options
> -[no]xvgr bool yes Add specific codes (legends etc.) in the output
> xvg files for the xmgrace program
> -[no]pd bool no Use particle decompostion
> -dd vector 0 0 0 Domain decomposition grid, 0 is optimize
> -npme int -1 Number of separate nodes to be used for PME, -1
> is guess
> -ddorder enum interleave DD node order: interleave, pp_pme or
> cartesian
> -[no]ddcheck bool yes Check for all bonded interactions with DD
> -rdd real 0 The maximum distance for bonded interactions
> with
> DD (nm), 0 is determine from initial
> coordinates
> -rcon real 0 Maximum distance for P-LINCS (nm), 0 is
> estimate
> -dlb enum auto Dynamic load balancing (with DD): auto, no
> or yes
> -dds real 0.8 Minimum allowed dlb scaling of the DD cell size
> -[no]sum bool yes Sum the energies at every step
> -[no]v bool yes Be loud and noisy
> -[no]compact bool yes Write a compact log file
> -[no]seppot bool no Write separate V and dVdl terms for each
> interaction type and node to the log file(s)
> -pforce real -1 Print all forces larger than this (kJ/mol nm)
> -[no]reprod bool no Try to avoid optimizations that affect binary
> reproducibility
> -cpt real 30 Checkpoint interval (minutes)
> -[no]append bool no Append to previous output files when continuing
> from checkpoint
> -[no]addpart bool yes Add the simulation part number to all output
> files when continuing from checkpoint
> -maxh real -1 Terminate after 0.99 times this time (hours)
> -multi int 33 Do multiple simulations in parallel
> -replex int 1000 Attempt replica exchange every # steps
> -reseed int -1 Seed for replica exchange, -1 is generate a
> seed
> -[no]glas bool no Do glass simulation with special long range
> corrections
> -[no]ionize bool no Do a simulation including the effect of an
> X-Ray
> bombardment on your system
>
> Getting Loaded...
> Getting Loaded...
> Reading file chk_0.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Reading file chk_32.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_1.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_3.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_13.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Reading file chk_15.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_12.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_25.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_30.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_26.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_24.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_4.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_16.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_8.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Reading file chk_5.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_20.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_11.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_7.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_28.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_6.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_31.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_27.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_14.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_29.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_18.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_10.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_9.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_19.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_22.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_17.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_21.tpr, VERSION 4.0.5 (single precision)
> Reading file chk_23.tpr, VERSION 4.0.5 (single precision)
> Getting Loaded...
> Reading file chk_2.tpr, VERSION 4.0.5 (single precision)
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
>
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
>
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
>
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Loaded with Money
>
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Loaded with Money
>
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> Making 1D domain decomposition 2 x 1 x 1
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
> starting mdrun 'Protein'
> 500000 steps, 1000.0 ps.
>
> -------------------------------------------------------
> Program mdrun_mpi, VERSION 4.0.7
> Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179
>
> Fatal error:
> Not enough memory. Failed to realloc 790760 bytes for nlist->jjnr,
> nlist->jjnr=0x98500030
> (called from file ../../../SRC/src/mdlib/ns.c, line 503)
> -------------------------------------------------------
>
> Thanx for Using GROMACS - Have a Nice Day
> : Cannot allocate memory
> Error on node 64, will try to stop all the nodes
> Halting parallel program mdrun_mpi on CPU 64 out of 66
>
> gcq#0: Thanx for Using GROMACS - Have a Nice Day
>
> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 64
> -------------------------------------------------------
> Program mdrun_mpi, VERSION 4.0.7
> Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179
>
> Fatal error:
> Not enough memory. Failed to realloc 790760 bytes for nlist->jjnr,
> nlist->jjnr=0x9a300030
> (called from file ../../../SRC/src/mdlib/ns.c, line 503)
> -------------------------------------------------------
>
> Thanx for Using GROMACS - Have a Nice Day
> : Cannot allocate memory
> Error on node 55, will try to stop all the nodes
> Halting parallel program mdrun_mpi on CPU 55 out of 66
>
> -------------------------------------------------------
> Program mdrun_mpi, VERSION 4.0.7
> Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179
>
> Fatal error:
> Not enough memory. Failed to realloc 400768 bytes for nlist->jjnr,
> nlist->jjnr=0x9a300030
> (called from file ../../../SRC/src/mdlib/ns.c, line 503)
> -------------------------------------------------------
>
> Thanx for Using GROMACS - Have a Nice Day
> : Cannot allocate memory
> Error on node 9, will try to stop all the nodes
> Halting parallel program mdrun_mpi on CPU 9 out of 66
>
> gcq#0: Thanx for Using GROMACS - Have a Nice Day
>
> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 9
> gcq#0: Thanx for Using GROMACS - Have a Nice Day
>
> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 55
>
>
>
More information about the gromacs.org_gmx-users
mailing list