[gmx-developers] About dynamics loading balance
Yunlong Liu
yliu120 at jh.edu
Thu Aug 21 19:22:02 CEST 2014
Hi Gromacs Developers,
I found something about the dynamic loading balance really interesting.
I am running my simulation on Stampede supercomputer, which has nodes
with 16-physical core ( really 16 Intel Xeon cores on one node ) and an
NVIDIA Tesla K20m GPU associated.
When I am using only the CPUs, I turned on dynamic loading balance by
-dlb yes. And it seems to work really good, and the loading imbalance is
only 1~2%. This really helps improve the performance by 5~7%?But when I
am running my code on GPU-CPU hybrid ( GPU node, 16-cpu and 1 GPU), the
dynamic loading balance kicked in since the imbalance goes up to ~50%
instantly after loading. Then the the system reports a
fail-to-allocate-memory error:
NOTE: Turning on dynamic load balancing
-------------------------------------------------------
Program mdrun_mpi, VERSION 5.0
Source code file:
/home1/03002/yliu120/build/gromacs-5.0/src/gromacs/utility/smalloc.c,
line: 226
Fatal error:
Not enough memory. Failed to realloc 1020720 bytes for dest->a,
dest->a=d5800030
(called from file
/home1/03002/yliu120/build/gromacs-5.0/src/gromacs/mdlib/domdec_top.c,
line 1061)
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
: Cannot allocate memory
Error on rank 0, will try to stop all ranks
Halting parallel program mdrun_mpi on CPU 0 out of 4
gcq#274: "I Feel a Great Disturbance in the Force" (The Emperor Strikes
Back)
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[c442-702.stampede.tacc.utexas.edu:mpispawn_0][readline] Unexpected
End-Of-File on file descriptor 6. MPI process died?
[c442-702.stampede.tacc.utexas.edu:mpispawn_0][mtpmi_processops] Error
while reading PMI socket. MPI process died?
[c442-702.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI
process (rank: 0, pid: 112839) exited with status 255
TACC: MPI job exited with code: 1
TACC: Shutdown complete. Exiting.
So I manually turned off the dynamic loading balance by -dlb no. The
simulation goes through with the very high loading imbalance, like:
DD step 139999 load imb.: force 51.3%
Step Time Lambda
140000 280.00000 0.00000
Energies (kJ/mol)
U-B Proper Dih. Improper Dih. CMAP Dih. LJ-14
4.88709e+04 1.21990e+04 2.99128e+03 -1.46719e+03 1.98569e+04
Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip.
2.54663e+05 4.05141e+05 -3.16020e+04 -3.75610e+06 2.24819e+04
Potential Kinetic En. Total Energy Temperature Pres. DC (bar)
-3.02297e+06 6.15217e+05 -2.40775e+06 3.09312e+02 -2.17704e+02
Pressure (bar) Constr. rmsd
-3.39003e+01 3.10750e-05
DD step 149999 load imb.: force 60.8%
Step Time Lambda
150000 300.00000 0.00000
Energies (kJ/mol)
U-B Proper Dih. Improper Dih. CMAP Dih. LJ-14
4.96380e+04 1.21010e+04 2.99986e+03 -1.51918e+03 1.97542e+04
Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip.
2.54305e+05 4.06024e+05 -3.15801e+04 -3.75534e+06 2.24001e+04
Potential Kinetic En. Total Energy Temperature Pres. DC (bar)
-3.02121e+06 6.17009e+05 -2.40420e+06 3.10213e+02 -2.17403e+02
Pressure (bar) Constr. rmsd
-1.40623e+00 3.16495e-05
I think this high loading imbalance will affect more than 20% of the
performance but at least it will let the simulation on. Therefore, the
problem I would like to report is that when running simulation with
GPU-CPU hybrid with very few GPU, the dynamic loading balance will cause
domain decomposition problems ( fail-to-allocate-memory ). I don't know
whether there is any solution to this problem currently or anything
could be improved?
Yunlong
--
========================================
Yunlong Liu, PhD Candidate
Computational Biology and Biophysics
Department of Biophysics and Biophysical Chemistry
School of Medicine, The Johns Hopkins University
Email: yliu120 at jhmi.edu
Address: 725 N Wolfe St, WBSB RM 601, 21205
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20140821/023cd16f/attachment-0001.html>
More information about the gromacs.org_gmx-developers
mailing list