[gmx-users] [gmx-developers] About dynamics loading balance
Yunlong Liu
yliu120 at jh.edu
Thu Aug 21 20:26:03 CEST 2014
Hi Roland,
I just compiled the latest gromacs-5.0 version released on Jun 29th. I
will recompile it as you suggested by using those Flags. It seems like
the high loading imbalance doesn't affect the performance as well, which
is weird.
Thank you.
Yunlong
On 8/21/14, 2:13 PM, Roland Schulz wrote:
> Hi,
>
>
> On Thu, Aug 21, 2014 at 1:56 PM, Yunlong Liu <yliu120 at jh.edu
> <mailto:yliu120 at jh.edu>> wrote:
>
> Hi Roland,
>
> The problem I am posting is caused by trivial errors (like not
> enough memory) and I think it should be a real bug inside the
> gromacs-GPU support code.
>
> It is unlikely a trivial error because otherwise someone else would
> have noticed. You could try the release-5-0 branch from git, but I'm
> not aware of any bugfixes related to memory allocation.
> The memory allocation which causes the error isn't the problem. The
> printed size is reasonable. You could recompile with PRINT_ALLOC_KB
> (add -DPRINT_ALLOC_KB to CMAKE_C_FLAGS) and rerun the simulation. It
> might tell you where the usual large memory allocation happens.
>
> PS: Please don't reply to an individual Gromacs developer. Keep all
> conversation on the gmx-users list.
>
> Roland
>
> That is the reason why I post this problem to the developer
> mailing-list.
>
> My system contains ~240,000 atoms. It is a rather big protein. The
> memory information of the node is :
>
> top - 12:46:59 up 15 days, 22:18, 1 user, load average: 1.13,
> 6.27, 11.28
> Tasks: 510 total, 2 running, 508 sleeping, 0 stopped, 0 zombie
> Cpu(s): 6.3%us, 0.0%sy, 0.0%ni, 93.7%id, 0.0%wa, 0.0%hi,
> 0.0%si, 0.0%st
> Mem: 32815324k total, 4983916k used, 27831408k free, 7984k
> buffers
> Swap: 4194296k total, 0k used, 4194296k free, 700588k
> cached
>
> I am running the simulation on 2 nodes, 4 MPI ranks and each rank
> with 8 OPENMP-threads. I list the information of their CPU and GPU
> here:
>
> c442-702.stampede(1)$ nvidia-smi
> Thu Aug 21 12:46:17 2014
> +------------------------------------------------------+
> | NVIDIA-SMI 331.67 Driver Version: 331.67 |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile
> Uncorr. ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util
> Compute M. |
> |===============================+======================+======================|
> | 0 Tesla K20m Off | 0000:03:00.0 Off
> | 0 |
> | N/A 22C P0 46W / 225W | 172MiB / 4799MiB |
> 0% Default |
> +-------------------------------+----------------------+----------------------+
>
> +-----------------------------------------------------------------------------+
> | Compute processes: GPU Memory |
> | GPU PID Process name
> Usage |
> |=============================================================================|
> | 0 113588 /work/03002/yliu120/gromacs-5/bin/mdrun_mpi 77MiB |
> | 0 113589 /work/03002/yliu120/gromacs-5/bin/mdrun_mpi 77MiB |
> +-----------------------------------------------------------------------------+
>
> c442-702.stampede(4)$ lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 16
> On-line CPU(s) list: 0-15
> Thread(s) per core: 1
> Core(s) per socket: 8
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 45
> Stepping: 7
> CPU MHz: 2701.000
> BogoMIPS: 5399.22
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 20480K
> NUMA node0 CPU(s): 0-7
> NUMA node1 CPU(s): 8-15
>
> I hope this information will help. Thank you.
>
> Yunlong
>
>
>
>
>
>
> On 8/21/14, 1:38 PM, Roland Schulz wrote:
>> Hi,
>>
>> please don't use gmx-developers for user questions. Feel free to
>> use it if you want to fix the problem, and have questions about
>> implementation details.
>>
>> Please provide more details: How large is your system? How much
>> memory does a node have? On how many nodes do you try to run? How
>> many mpi-ranks do you have per node?
>>
>> Roland
>>
>> On Thu, Aug 21, 2014 at 12:21 PM, Yunlong Liu <yliu120 at jh.edu
>> <mailto:yliu120 at jh.edu>> wrote:
>>
>> Hi Gromacs Developers,
>>
>> I found something about the dynamic loading balance really
>> interesting. I am running my simulation on Stampede
>> supercomputer, which has nodes with 16-physical core ( really
>> 16 Intel Xeon cores on one node ) and an NVIDIA Tesla K20m
>> GPU associated.
>>
>> When I am using only the CPUs, I turned on dynamic loading
>> balance by -dlb yes. And it seems to work really good, and
>> the loading imbalance is only 1~2%. This really helps improve
>> the performance by 5~7%。But when I am running my code on
>> GPU-CPU hybrid ( GPU node, 16-cpu and 1 GPU), the dynamic
>> loading balance kicked in since the imbalance goes up to ~50%
>> instantly after loading. Then the the system reports a
>> fail-to-allocate-memory error:
>>
>> NOTE: Turning on dynamic load balancing
>>
>>
>> -------------------------------------------------------
>> Program mdrun_mpi, VERSION 5.0
>> Source code file:
>> /home1/03002/yliu120/build/gromacs-5.0/src/gromacs/utility/smalloc.c,
>> line: 226
>>
>> Fatal error:
>> Not enough memory. Failed to realloc 1020720 bytes for
>> dest->a, dest->a=d5800030
>> (called from file
>> /home1/03002/yliu120/build/gromacs-5.0/src/gromacs/mdlib/domdec_top.c,
>> line 1061)
>> For more information and tips for troubleshooting, please
>> check the GROMACS
>> website at http://www.gromacs.org/Documentation/Errors
>> -------------------------------------------------------
>> : Cannot allocate memory
>> Error on rank 0, will try to stop all ranks
>> Halting parallel program mdrun_mpi on CPU 0 out of 4
>>
>> gcq#274: "I Feel a Great Disturbance in the Force" (The
>> Emperor Strikes Back)
>>
>> [cli_0]: aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
>> [c442-702.stampede.tacc.utexas.edu:mpispawn_0][readline]
>> Unexpected End-Of-File on file descriptor 6. MPI process died?
>> [c442-702.stampede.tacc.utexas.edu:mpispawn_0][mtpmi_processops]
>> Error while reading PMI socket. MPI process died?
>> [c442-702.stampede.tacc.utexas.edu:mpispawn_0][child_handler]
>> MPI process (rank: 0, pid: 112839) exited with status 255
>> TACC: MPI job exited with code: 1
>>
>> TACC: Shutdown complete. Exiting.
>>
>> So I manually turned off the dynamic loading balance by -dlb
>> no. The simulation goes through with the very high loading
>> imbalance, like:
>>
>> DD step 139999 load imb.: force 51.3%
>>
>> Step Time Lambda
>> 140000 280.00000 0.00000
>>
>> Energies (kJ/mol)
>> U-B Proper Dih. Improper Dih. CMAP
>> Dih. LJ-14
>> 4.88709e+04 1.21990e+04 2.99128e+03 -1.46719e+03
>> 1.98569e+04
>> Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
>> Coul. recip.
>> 2.54663e+05 4.05141e+05 -3.16020e+04 -3.75610e+06
>> 2.24819e+04
>> Potential Kinetic En. Total Energy Temperature
>> Pres. DC (bar)
>> -3.02297e+06 6.15217e+05 -2.40775e+06 3.09312e+02
>> -2.17704e+02
>> Pressure (bar) Constr. rmsd
>> -3.39003e+01 3.10750e-05
>>
>> DD step 149999 load imb.: force 60.8%
>>
>> Step Time Lambda
>> 150000 300.00000 0.00000
>>
>> Energies (kJ/mol)
>> U-B Proper Dih. Improper Dih. CMAP
>> Dih. LJ-14
>> 4.96380e+04 1.21010e+04 2.99986e+03 -1.51918e+03
>> 1.97542e+04
>> Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
>> Coul. recip.
>> 2.54305e+05 4.06024e+05 -3.15801e+04 -3.75534e+06
>> 2.24001e+04
>> Potential Kinetic En. Total Energy Temperature
>> Pres. DC (bar)
>> -3.02121e+06 6.17009e+05 -2.40420e+06 3.10213e+02
>> -2.17403e+02
>> Pressure (bar) Constr. rmsd
>> -1.40623e+00 3.16495e-05
>>
>> I think this high loading imbalance will affect more than 20%
>> of the performance but at least it will let the simulation
>> on. Therefore, the problem I would like to report is that
>> when running simulation with GPU-CPU hybrid with very few
>> GPU, the dynamic loading balance will cause domain
>> decomposition problems ( fail-to-allocate-memory ). I don't
>> know whether there is any solution to this problem currently
>> or anything could be improved?
>>
>> Yunlong
>>
>>
>>
>>
>> --
>>
>> ========================================
>> Yunlong Liu, PhD Candidate
>> Computational Biology and Biophysics
>> Department of Biophysics and Biophysical Chemistry
>> School of Medicine, The Johns Hopkins University
>> Email: yliu120 at jhmi.edu <mailto:yliu120 at jhmi.edu>
>> Address: 725 N Wolfe St, WBSB RM 601, 21205
>> ========================================
>>
>>
>>
>>
>> --
>> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>> <http://cmb.ornl.gov>
>> 865-241-1537 <tel:865-241-1537>, ORNL PO BOX 2008 MS6309
>
> --
>
> ========================================
> Yunlong Liu, PhD Candidate
> Computational Biology and Biophysics
> Department of Biophysics and Biophysical Chemistry
> School of Medicine, The Johns Hopkins University
> Email: yliu120 at jhmi.edu <mailto:yliu120 at jhmi.edu>
> Address: 725 N Wolfe St, WBSB RM 601, 21205
> ========================================
>
>
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov <http://cmb.ornl.gov>
> 865-241-1537 <tel:865-241-1537>, ORNL PO BOX 2008 MS6309
--
========================================
Yunlong Liu, PhD Candidate
Computational Biology and Biophysics
Department of Biophysics and Biophysical Chemistry
School of Medicine, The Johns Hopkins University
Email: yliu120 at jhmi.edu
Address: 725 N Wolfe St, WBSB RM 601, 21205
========================================
More information about the gromacs.org_gmx-users
mailing list