[gmx-users] Replica Exchange MD on more than 64 processors

bharat v. adkar bharat at sscu.iisc.ernet.in
Sat Dec 26 22:50:04 CET 2009


Dear all,
   I am trying to perform replica exchange MD (REMD) on a 'protein in 
water' system. I am following instructions given on wiki (How-Tos -> 
REMD). I have to perform the REMD simulation with 35 different 
temperatures. As per advise on wiki, I equilibrated the system at 
respective temperatures (total of 35 equilibration simulations). After 
this I generated chk_0.tpr, chk_1.tpr, ..., chk_34.tpr files from the 
equilibrated structures.

Now when I submit final job for REMD with following command-line, it 
gives some error:

command line: mpiexec -np 70 mdrun -multi 35 -replex 1000 -s chk_.tpr -v

error msg:
-------------------------------------------------------
Program mdrun_mpi, VERSION 4.0.7
Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179

Fatal error:
Not enough memory. Failed to realloc 790760 bytes for nlist->jjnr, 
nlist->jjnr=0x9a400030
(called from file ../../../SRC/src/mdlib/ns.c, line 503)
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day
: Cannot allocate memory
Error on node 19, will try to stop all the nodes
Halting parallel program mdrun_mpi on CPU 19 out of 70
***********************************************************************


The individual node on the cluster has 8GB of physical memory and 16GB of 
swap memory. Moreover, when logged onto the individual nodes, it shows 
more than 1GB of free memory, so there should be no problem with cluster 
memory. Also, the equilibration jobs for the same system are run on the 
same cluster without any problem.

What I have observed by submitting different test jobs with varying number 
of processors (and no. of replicas, wherever necessary), that any job with 
total number of processors <= 64, runs faithfully without any problem. As 
soon as total number of processors are more than 64, it gives the above 
error. I have tested this with 65 processors/65 replicas also.

System: Protein + water + Na ions (total 46878 atoms)
Gromacs version: tested with both v4.0.5 and v4.0.7
compiled with: --enable-float --with-fft=fftw3 --enable-mpi
compiler: gcc_3.4.6 -O3
machine details: uname -mpio: x86_64 x86_64 x86_64 GNU/Linux


I tried searching the mailing-list without any luck. I am not sure, if i 
am doing anything wrong in giving commands. Please correct me if it is 
wrong.

Kindly let me know the solution.


bharat


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the gromacs.org_gmx-users mailing list