[gmx-users] Memory allocation error in HPC

Joydeep Munshi jom317 at lehigh.edu
Fri Sep 29 06:13:05 CEST 2017


Dear Gromacs users,

I am trying to run a script in HPC using 80 CPUs from 4 nodes each with
128gb of memory available. while running the below mdrun for energy
minimisation process, I am getting memory allocation issue and the job is
being halted. Is there any issue with the way I am utilizing OpenMP and MPI
process in the code? What could be possible solutions.

Note while running the code in my workstation, I am not getting any error
and the energy minimization is converging quite well.

Part of the script given below:

*export OMP_NUM_THREADS=4*
*srun --ntasks-per-node=5 gmx_mpi mdrun $FLAGS -v -deffnm min_step000
-ntomp 4 >> mdrun.log 2>&1*


Error in mdrun.log is as below:

*Program gmx mdrun, VERSION 5.1.2*
*Source code file:
/home/alp514/source/corona2-build/gromacs-5.1.2/src/gromacs/utility/smalloc.c,
line: 227*

*Fatal error:*
*Not enough memory. Failed to realloc 302664 bytes for *f, *f=19367240*
*(called from file
/home/alp514/source/corona2-build/gromacs-5.1.2/src/gromacs/domdec/domdec.cpp,
line 1708)*
*For more information and tips for troubleshooting, please check the
GROMACS*
*website at http://www.gromacs.org/Documentation/Errors
<http://www.gromacs.org/Documentation/Errors>*
*-------------------------------------------------------*
*: Cannot allocate memory*
*Halting parallel program gmx mdrun on rank 27 out of 80*
*[cli_27]: aborting job:*
*application called MPI_Abort(MPI_COMM_WORLD, 1) - process 27*
*srun: error: sol-b410: task 27: Exited with exit code 1*
*[sol-b411:mpi_rank_47][handle_cqe] Send desc error in msg to 27,
wc_opcode=1*
*srun: error: sol-b411: task 47: Exited with exit code 252*
*[sol-b411:mpi_rank_47][handle_cqe] Msg from 27: wc.status=12,
wc.wr_id=0x331ff40, wc.opcode=1, vbuf->phead->type=0 =
MPIDI_CH3_PKT_EAGER_SEND*
*[sol-b411:mpi_rank_47][handle_cqe]
../mvapich2-2.1/src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:587:
[] Got completion with error 12, vendor code=0x81, dest rank=27*


Thanks in advance, Hoping suggestions on the resolution.


*Thanks and regards,*
*Joydeep Munshi,*
*Graduate Research Assistant,*
*Mechanical Engineering and Mechanics,*
*Lehigh University*


More information about the gromacs.org_gmx-users mailing list