[gmx-users] [gmx-developers] About dynamics loading balance
Yunlong Liu
yliu120 at jhmi.edu
Mon Aug 25 06:08:30 CEST 2014
Hi Szilard,
I would like to send you the log file and i really need your help. Please trust me that i have tested many times when i turned on the dlb, the gpu nodes reported cannot allocate memory error and shut all MPI processes down. I have to tolerate the large loading imbalance (50%) to run my simulations. I wish i can figure out some way that makes my simulation run on GPU and have better performance.
Where can i post the log file? If i paste it here, it will be really long.
Yunlong
> On Aug 24, 2014, at 2:20 PM, "Szilárd Páll" <pall.szilard at gmail.com> wrote:
>
>> On Thu, Aug 21, 2014 at 8:25 PM, Yunlong Liu <yliu120 at jh.edu> wrote:
>> Hi Roland,
>>
>> I just compiled the latest gromacs-5.0 version released on Jun 29th. I will
>> recompile it as you suggested by using those Flags. It seems like the high
>> loading imbalance doesn't affect the performance as well, which is weird.
>
> How did you draw that conclusion? Please show us log files of the
> respective runs, that will help to assess what is gong on.
>
> --
> Szilárd
>
>> Thank you.
>> Yunlong
>>
>>> On 8/21/14, 2:13 PM, Roland Schulz wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> On Thu, Aug 21, 2014 at 1:56 PM, Yunlong Liu <yliu120 at jh.edu
>>> <mailto:yliu120 at jh.edu>> wrote:
>>>
>>> Hi Roland,
>>>
>>> The problem I am posting is caused by trivial errors (like not
>>> enough memory) and I think it should be a real bug inside the
>>> gromacs-GPU support code.
>>>
>>> It is unlikely a trivial error because otherwise someone else would have
>>> noticed. You could try the release-5-0 branch from git, but I'm not aware of
>>> any bugfixes related to memory allocation.
>>> The memory allocation which causes the error isn't the problem. The
>>> printed size is reasonable. You could recompile with PRINT_ALLOC_KB (add
>>> -DPRINT_ALLOC_KB to CMAKE_C_FLAGS) and rerun the simulation. It might tell
>>> you where the usual large memory allocation happens.
>>>
>>> PS: Please don't reply to an individual Gromacs developer. Keep all
>>> conversation on the gmx-users list.
>>>
>>> Roland
>>>
>>> That is the reason why I post this problem to the developer
>>> mailing-list.
>>>
>>> My system contains ~240,000 atoms. It is a rather big protein. The
>>> memory information of the node is :
>>>
>>> top - 12:46:59 up 15 days, 22:18, 1 user, load average: 1.13,
>>> 6.27, 11.28
>>> Tasks: 510 total, 2 running, 508 sleeping, 0 stopped, 0 zombie
>>> Cpu(s): 6.3%us, 0.0%sy, 0.0%ni, 93.7%id, 0.0%wa, 0.0%hi,
>>> 0.0%si, 0.0%st
>>> Mem: 32815324k total, 4983916k used, 27831408k free, 7984k
>>> buffers
>>> Swap: 4194296k total, 0k used, 4194296k free, 700588k
>>> cached
>>>
>>> I am running the simulation on 2 nodes, 4 MPI ranks and each rank
>>> with 8 OPENMP-threads. I list the information of their CPU and GPU
>>> here:
>>>
>>> c442-702.stampede(1)$ nvidia-smi
>>> Thu Aug 21 12:46:17 2014
>>> +------------------------------------------------------+
>>> | NVIDIA-SMI 331.67 Driver Version: 331.67 |
>>>
>>> |-------------------------------+----------------------+----------------------+
>>> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile
>>> Uncorr. ECC |
>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util
>>> Compute M. |
>>>
>>> |===============================+======================+======================|
>>> | 0 Tesla K20m Off | 0000:03:00.0 Off
>>> | 0 |
>>> | N/A 22C P0 46W / 225W | 172MiB / 4799MiB | 0%
>>> Default |
>>>
>>> +-------------------------------+----------------------+----------------------+
>>>
>>>
>>> +-----------------------------------------------------------------------------+
>>> | Compute processes: GPU Memory |
>>> | GPU PID Process name
>>> Usage |
>>>
>>> |=============================================================================|
>>> | 0 113588 /work/03002/yliu120/gromacs-5/bin/mdrun_mpi 77MiB |
>>> | 0 113589 /work/03002/yliu120/gromacs-5/bin/mdrun_mpi 77MiB |
>>>
>>> +-----------------------------------------------------------------------------+
>>>
>>> c442-702.stampede(4)$ lscpu
>>> Architecture: x86_64
>>> CPU op-mode(s): 32-bit, 64-bit
>>> Byte Order: Little Endian
>>> CPU(s): 16
>>> On-line CPU(s) list: 0-15
>>> Thread(s) per core: 1
>>> Core(s) per socket: 8
>>> Socket(s): 2
>>> NUMA node(s): 2
>>> Vendor ID: GenuineIntel
>>> CPU family: 6
>>> Model: 45
>>> Stepping: 7
>>> CPU MHz: 2701.000
>>> BogoMIPS: 5399.22
>>> Virtualization: VT-x
>>> L1d cache: 32K
>>> L1i cache: 32K
>>> L2 cache: 256K
>>> L3 cache: 20480K
>>> NUMA node0 CPU(s): 0-7
>>> NUMA node1 CPU(s): 8-15
>>>
>>> I hope this information will help. Thank you.
>>>
>>> Yunlong
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On 8/21/14, 1:38 PM, Roland Schulz wrote:
>>>>
>>>> Hi,
>>>>
>>>> please don't use gmx-developers for user questions. Feel free to
>>>> use it if you want to fix the problem, and have questions about
>>>> implementation details.
>>>>
>>>> Please provide more details: How large is your system? How much
>>>> memory does a node have? On how many nodes do you try to run? How
>>>> many mpi-ranks do you have per node?
>>>>
>>>> Roland
>>>>
>>>> On Thu, Aug 21, 2014 at 12:21 PM, Yunlong Liu <yliu120 at jh.edu
>>>> <mailto:yliu120 at jh.edu>> wrote:
>>>>
>>>> Hi Gromacs Developers,
>>>>
>>>> I found something about the dynamic loading balance really
>>>> interesting. I am running my simulation on Stampede
>>>> supercomputer, which has nodes with 16-physical core ( really
>>>> 16 Intel Xeon cores on one node ) and an NVIDIA Tesla K20m
>>>> GPU associated.
>>>>
>>>> When I am using only the CPUs, I turned on dynamic loading
>>>> balance by -dlb yes. And it seems to work really good, and
>>>> the loading imbalance is only 1~2%. This really helps improve
>>>> the performance by 5~7%。But when I am running my code on
>>>> GPU-CPU hybrid ( GPU node, 16-cpu and 1 GPU), the dynamic
>>>> loading balance kicked in since the imbalance goes up to ~50%
>>>> instantly after loading. Then the the system reports a
>>>> fail-to-allocate-memory error:
>>>>
>>>> NOTE: Turning on dynamic load balancing
>>>>
>>>>
>>>> -------------------------------------------------------
>>>> Program mdrun_mpi, VERSION 5.0
>>>> Source code file:
>>>>
>>>> /home1/03002/yliu120/build/gromacs-5.0/src/gromacs/utility/smalloc.c,
>>>> line: 226
>>>>
>>>> Fatal error:
>>>> Not enough memory. Failed to realloc 1020720 bytes for
>>>> dest->a, dest->a=d5800030
>>>> (called from file
>>>>
>>>> /home1/03002/yliu120/build/gromacs-5.0/src/gromacs/mdlib/domdec_top.c,
>>>> line 1061)
>>>> For more information and tips for troubleshooting, please
>>>> check the GROMACS
>>>> website at http://www.gromacs.org/Documentation/Errors
>>>> -------------------------------------------------------
>>>> : Cannot allocate memory
>>>> Error on rank 0, will try to stop all ranks
>>>> Halting parallel program mdrun_mpi on CPU 0 out of 4
>>>>
>>>> gcq#274: "I Feel a Great Disturbance in the Force" (The
>>>> Emperor Strikes Back)
>>>>
>>>> [cli_0]: aborting job:
>>>> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
>>>> [c442-702.stampede.tacc.utexas.edu:mpispawn_0][readline]
>>>> Unexpected End-Of-File on file descriptor 6. MPI process died?
>>>> [c442-702.stampede.tacc.utexas.edu:mpispawn_0][mtpmi_processops]
>>>> Error while reading PMI socket. MPI process died?
>>>> [c442-702.stampede.tacc.utexas.edu:mpispawn_0][child_handler]
>>>> MPI process (rank: 0, pid: 112839) exited with status 255
>>>> TACC: MPI job exited with code: 1
>>>>
>>>> TACC: Shutdown complete. Exiting.
>>>>
>>>> So I manually turned off the dynamic loading balance by -dlb
>>>> no. The simulation goes through with the very high loading
>>>> imbalance, like:
>>>>
>>>> DD step 139999 load imb.: force 51.3%
>>>>
>>>> Step Time Lambda
>>>> 140000 280.00000 0.00000
>>>>
>>>> Energies (kJ/mol)
>>>> U-B Proper Dih. Improper Dih. CMAP
>>>> Dih. LJ-14
>>>> 4.88709e+04 1.21990e+04 2.99128e+03 -1.46719e+03
>>>> 1.98569e+04
>>>> Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
>>>> Coul. recip.
>>>> 2.54663e+05 4.05141e+05 -3.16020e+04 -3.75610e+06
>>>> 2.24819e+04
>>>> Potential Kinetic En. Total Energy Temperature
>>>> Pres. DC (bar)
>>>> -3.02297e+06 6.15217e+05 -2.40775e+06 3.09312e+02
>>>> -2.17704e+02
>>>> Pressure (bar) Constr. rmsd
>>>> -3.39003e+01 3.10750e-05
>>>>
>>>> DD step 149999 load imb.: force 60.8%
>>>>
>>>> Step Time Lambda
>>>> 150000 300.00000 0.00000
>>>>
>>>> Energies (kJ/mol)
>>>> U-B Proper Dih. Improper Dih. CMAP
>>>> Dih. LJ-14
>>>> 4.96380e+04 1.21010e+04 2.99986e+03 -1.51918e+03
>>>> 1.97542e+04
>>>> Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
>>>> Coul. recip.
>>>> 2.54305e+05 4.06024e+05 -3.15801e+04 -3.75534e+06
>>>> 2.24001e+04
>>>> Potential Kinetic En. Total Energy Temperature
>>>> Pres. DC (bar)
>>>> -3.02121e+06 6.17009e+05 -2.40420e+06 3.10213e+02
>>>> -2.17403e+02
>>>> Pressure (bar) Constr. rmsd
>>>> -1.40623e+00 3.16495e-05
>>>>
>>>> I think this high loading imbalance will affect more than 20%
>>>> of the performance but at least it will let the simulation
>>>> on. Therefore, the problem I would like to report is that
>>>> when running simulation with GPU-CPU hybrid with very few
>>>> GPU, the dynamic loading balance will cause domain
>>>> decomposition problems ( fail-to-allocate-memory ). I don't
>>>> know whether there is any solution to this problem currently
>>>> or anything could be improved?
>>>>
>>>> Yunlong
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ========================================
>>>> Yunlong Liu, PhD Candidate
>>>> Computational Biology and Biophysics
>>>> Department of Biophysics and Biophysical Chemistry
>>>> School of Medicine, The Johns Hopkins University
>>>> Email: yliu120 at jhmi.edu <mailto:yliu120 at jhmi.edu>
>>>>
>>>> Address: 725 N Wolfe St, WBSB RM 601, 21205
>>>> ========================================
>>>>
>>>>
>>>>
>>>>
>>>> -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>>>> <http://cmb.ornl.gov>
>>>> 865-241-1537 <tel:865-241-1537>, ORNL PO BOX 2008 MS6309
>>>
>>>
>>> --
>>> ========================================
>>> Yunlong Liu, PhD Candidate
>>> Computational Biology and Biophysics
>>> Department of Biophysics and Biophysical Chemistry
>>> School of Medicine, The Johns Hopkins University
>>> Email: yliu120 at jhmi.edu <mailto:yliu120 at jhmi.edu>
>>>
>>> Address: 725 N Wolfe St, WBSB RM 601, 21205
>>> ========================================
>>>
>>>
>>>
>>>
>>> --
>>> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov <http://cmb.ornl.gov>
>>> 865-241-1537 <tel:865-241-1537>, ORNL PO BOX 2008 MS6309
>>
>>
>> --
>>
>> ========================================
>> Yunlong Liu, PhD Candidate
>> Computational Biology and Biophysics
>> Department of Biophysics and Biophysical Chemistry
>> School of Medicine, The Johns Hopkins University
>> Email: yliu120 at jhmi.edu
>> Address: 725 N Wolfe St, WBSB RM 601, 21205
>> ========================================
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
>> mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list