[gmx-users] Replica Exchange MD on more than 64 processors

Mark Abraham Mark.Abraham at anu.edu.au
Sun Dec 27 00:03:20 CET 2009


bharat v. adkar wrote:
> 
> Dear all,
>   I am trying to perform replica exchange MD (REMD) on a 'protein in 
> water' system. I am following instructions given on wiki (How-Tos -> 
> REMD). I have to perform the REMD simulation with 35 different 
> temperatures. As per advise on wiki, I equilibrated the system at 
> respective temperatures (total of 35 equilibration simulations). After 
> this I generated chk_0.tpr, chk_1.tpr, ..., chk_34.tpr files from the 
> equilibrated structures.
> 
> Now when I submit final job for REMD with following command-line, it 
> gives some error:
> 
> command line: mpiexec -np 70 mdrun -multi 35 -replex 1000 -s chk_.tpr -v
> 
> error msg:
> -------------------------------------------------------
> Program mdrun_mpi, VERSION 4.0.7
> Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179
> 
> Fatal error:
> Not enough memory. Failed to realloc 790760 bytes for nlist->jjnr, 
> nlist->jjnr=0x9a400030
> (called from file ../../../SRC/src/mdlib/ns.c, line 503)
> -------------------------------------------------------
> 
> Thanx for Using GROMACS - Have a Nice Day
> : Cannot allocate memory
> Error on node 19, will try to stop all the nodes
> Halting parallel program mdrun_mpi on CPU 19 out of 70
> ***********************************************************************
> 
> 
> The individual node on the cluster has 8GB of physical memory and 16GB 
> of swap memory. Moreover, when logged onto the individual nodes, it 
> shows more than 1GB of free memory, so there should be no problem with 
> cluster memory. Also, the equilibration jobs for the same system are run 
> on the same cluster without any problem.
> 
> What I have observed by submitting different test jobs with varying 
> number of processors (and no. of replicas, wherever necessary), that any 
> job with total number of processors <= 64, runs faithfully without any 
> problem. As soon as total number of processors are more than 64, it 
> gives the above error. I have tested this with 65 processors/65 replicas 
> also.

This sounds like you might be running on fewer physical CPUs than you 
have available. If so, running multiple MPI processes per physical CPU 
can lead to memory shortage conditions.

I don't know what you mean by "swap memory".

Mark

> System: Protein + water + Na ions (total 46878 atoms)
> Gromacs version: tested with both v4.0.5 and v4.0.7
> compiled with: --enable-float --with-fft=fftw3 --enable-mpi
> compiler: gcc_3.4.6 -O3
> machine details: uname -mpio: x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> I tried searching the mailing-list without any luck. I am not sure, if i 
> am doing anything wrong in giving commands. Please correct me if it is 
> wrong.
> 
> Kindly let me know the solution.
> 
> 
> bharat
> 
> 



More information about the gromacs.org_gmx-users mailing list