[gmx-users] strange GPU load distribution

Justin Lemkul jalemkul at vt.edu
Mon May 7 00:03:54 CEST 2018

On 5/6/18 5:51 PM, Alex wrote:
> Unfortunately, we're still bogged down when the EM runs (example 
> below) start -- CPU usage by these jobs is initially low, while their 
> PIDs show up in nvidia-smi. After about a minute all goes back to 
> normal. Because the user is doing it frequently (scripted), everything 
> is slowed down by a large factor. Interestingly, we have another user 
> utilizing a GPU with another MD package (LAMMPS) and that GPU is never 
> touched by these EM jobs.
> Any ideas will be greatly appreciated.

Thinking out loud - a run that explicitly calls for only the CPU to be 
used might be trying to detect GPU if mdrun is GPU-enabled. Is that a 
possibility, including any latency in detecting that device? Have you 
tested to make sure that an mdrun binary that is explicitly disabled 
from using GPU (-DGMX_GPU=OFF) doesn't affect the GPU usage when running 
the same command?


> Thanks,
> Alex
>> 60432 pts/8    Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 
>> -nb cpu -pme cpu -deffnm em_steep
>> On 4/27/2018 2:16 PM, Mark Abraham wrote:
>>> Hi,
>>> What you think was run isn't nearly as useful when troubleshooting as
>>> asking the kernel what is actually running.
>>> Mark
>>> On Fri, Apr 27, 2018, 21:59 Alex<nedomacho at gmail.com> wrote:
>>>> Mark, I copied the exact command line from the script, right above the
>>>> mdp file. It is literally how the script calls mdrun in this case:
>>>> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>>>> On 4/27/2018 1:52 PM, Mark Abraham wrote:
>>>>> Group cutoff scheme can never run on a gpu, so none of that should
>>>> matter.
>>>>> Use ps and find out what the command lines were.
>>>>> Mark
>>>>> On Fri, Apr 27, 2018, 21:37 Alex<nedomacho at gmail.com>  wrote:
>>>>>> Update: we're basically removing commands one by one from the script
>>>> that
>>>>>> submits the jobs causing the issue. The culprit is both EM and 
>>>>>> the MD
>>>> run:
>>>>>> and GPUs are being affected _before_ MD starts loading the CPU, i.e.
>>>> this
>>>>>> is the initial setting up of the EM run -- CPU load is near zero,
>>>>>> nvidia-smi reports the mess. I wonder if this is in any way 
>>>>>> related to
>>>> that
>>>>>> timing test we were failing a while back.
>>>>>> mdrun call and mdp below, though I suspect they have nothing to 
>>>>>> do with
>>>>>> what is happening. Any help will be very highly appreciated.
>>>>>> Alex
>>>>>> ***
>>>>>> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>>>>>> mdp:
>>>>>> ; Run control
>>>>>> integrator               = md-vv       ; Velocity Verlet
>>>>>> tinit                    = 0
>>>>>> dt                       = 0.002
>>>>>> nsteps                   = 500000    ; 1 ns
>>>>>> nstcomm                  = 100
>>>>>> ; Output control
>>>>>> nstxout                  = 50000
>>>>>> nstvout                  = 50000
>>>>>> nstfout                  = 0
>>>>>> nstlog                   = 50000
>>>>>> nstenergy                = 50000
>>>>>> nstxout-compressed       = 0
>>>>>> ; Neighborsearching and short-range nonbonded interactions
>>>>>> cutoff-scheme            = group
>>>>>> nstlist                  = 10
>>>>>> ns_type                  = grid
>>>>>> pbc                      = xyz
>>>>>> rlist                    = 1.4
>>>>>> ; Electrostatics
>>>>>> coulombtype              = cutoff
>>>>>> rcoulomb                 = 1.4
>>>>>> ; van der Waals
>>>>>> vdwtype                  = user
>>>>>> vdw-modifier             = none
>>>>>> rvdw                     = 1.4
>>>>>> ; Apply long range dispersion corrections for Energy and Pressure
>>>>>> DispCorr                  = EnerPres
>>>>>> ; Spacing for the PME/PPPM FFT grid
>>>>>> fourierspacing           = 0.12
>>>>>> ; EWALD/PME/PPPM parameters
>>>>>> pme_order                = 6
>>>>>> ewald_rtol               = 1e-06
>>>>>> epsilon_surface          = 0
>>>>>> ; Temperature coupling
>>>>>> Tcoupl                   = nose-hoover
>>>>>> tc_grps                  = system
>>>>>> tau_t                    = 1.0
>>>>>> ref_t                    = some_temperature
>>>>>> ; Pressure coupling is off for NVT
>>>>>> Pcoupl                   = No
>>>>>> tau_p                    = 0.5
>>>>>> compressibility          = 4.5e-05
>>>>>> ref_p                    = 1.0
>>>>>> ; options for bonds
>>>>>> constraints              = all-bonds
>>>>>> constraint_algorithm     = lincs
>>>>>> On Fri, Apr 27, 2018 at 1:14 PM, Alex<nedomacho at gmail.com>  wrote:
>>>>>>> As I said, only two users, and nvidia-smi shows the process 
>>>>>>> name. We're
>>>>>>> investigating and it does appear that it is EM that uses cutoff
>>>>>>> electrostatics and as a result the user did not bother with -pme 
>>>>>>> cpu in
>>>>>> the
>>>>>>> mdrun call. What would be the correct way to enforce cpu-only mdrun
>>>> when
>>>>>>> coulombtype = cutoff?
>>>>>>> Thanks,
>>>>>>> Alex
>>>>>>> On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham <
>>>> mark.j.abraham at gmail.com
>>>>>>> wrote:
>>>>>>>> No.
>>>>>>>> Look at the processes that are running, e.g. with top or ps. 
>>>>>>>> Either
>>>> old
>>>>>>>> simulations or another user is running.
>>>>>>>> Mark
>>>>>>>> On Fri, Apr 27, 2018, 20:33 Alex<nedomacho at gmail.com>  wrote:
>>>>>>>>> Strange. There are only two people using this machine, myself 
>>>>>>>>> being
>>>>>> one
>>>>>>>> of
>>>>>>>>> them, and the other person specifically forces -nb cpu -pme 
>>>>>>>>> cpu in
>>>> his
>>>>>>>>> calls to mdrun. Are any other GMX utilities (e.g. 
>>>>>>>>> insert-molecules,
>>>>>>>> grompp,
>>>>>>>>> or energy) trying to use GPUs?
>>>>>>>>> Thanks,
>>>>>>>>> Alex
>>>>>>>>> On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll <
>>>> pall.szilard at gmail.com
>>>>>>>>> wrote:
>>>>>>>>>> The second column is PIDs so there is a whole lot more going on
>>>>>> there
>>>>>>>>> than
>>>>>>>>>> just a single simulation, single rank using two GPUs. That 
>>>>>>>>>> would be
>>>>>>>> one
>>>>>>>>> PID
>>>>>>>>>> and two entries for the two GPUs. Are you sure you're not 
>>>>>>>>>> running
>>>>>>>> other
>>>>>>>>>> processes?
>>>>>>>>>> -- 
>>>>>>>>>> Szilárd
>>>>>>>>>> On Thu, Apr 26, 2018 at 5:52 AM, Alex<nedomacho at gmail.com>>>>>>>>>> wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>> I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on 
>>>>>>>>>>> -nt 24
>>>>>>>>> -ntmpi 4
>>>>>>>>>>> -npme 1 -pme gpu -nb gpu -gputasks 1122
>>>>>>>>>>> Once in a while the simulation slows down and nvidia-smi 
>>>>>>>>>>> reports
>>>>>>>>>> something
>>>>>>>>>>> like this:
>>>>>>>>>>> |    1     12981      C gmx
>>>>>>>>>>> 175MiB |
>>>>>>>>>>> |    2     12981      C gmx
>>>>>>>>>>> 217MiB |
>>>>>>>>>>> |    2     13083      C gmx
>>>>>>>>>>> 161MiB |
>>>>>>>>>>> |    2     13086      C gmx
>>>>>>>>>>> 159MiB |
>>>>>>>>>>> |    2     13089      C gmx
>>>>>>>>>>> 139MiB |
>>>>>>>>>>> |    2     13093      C gmx
>>>>>>>>>>> 163MiB |
>>>>>>>>>>> |    2     13096      C gmx
>>>>>>>>>>> 11MiB |
>>>>>>>>>>> |    2     13099      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13102      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13106      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13109      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13112      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13115      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13119      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13122      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13125      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13128      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13131      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13134      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13138      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> |    2     13141      C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> +-----------------------------------------------------------
>>>>>>>>>>> ------------------+
>>>>>>>>>>> Then goes back to the expected load. Is this normal?
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Alex
>>>>>>>>>>> -- 
>>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>> * Please search the archive athttp://www.gromacs.org/Support
>>>>>>>>>>> /Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or
>>>>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>>>>>> -- 
>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>> * Please search the archive athttp://www.gromacs.org/
>>>>>>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users 
>>>>>> or
>>>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>>>>> -- 
>>>>>>>>> Gromacs Users mailing list
>>>>>>>>> * Please search the archive at
>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List  
>>>>>>>>> before
>>>>>>>>> posting!
>>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users 
>>>> or
>>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>>>> -- 
>>>>>>>> Gromacs Users mailing list
>>>>>>>> * Please search the archive athttp://www.gromacs.org/Support
>>>>>>>> /Mailing_Lists/GMX-Users_List before posting!
>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users  
>>>>>>>> or
>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>> -- 
>>>>>> Gromacs Users mailing list
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List  before
>>>>>> posting!
>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users  
>>>>>> or
>>>>>> send a mail togmx-users-request at gromacs.org.
>>>> -- 
>>>> Gromacs Users mailing list
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users  or
>>>> send a mail togmx-users-request at gromacs.org.


Justin A. Lemkul, Ph.D.
Assistant Professor
Virginia Tech Department of Biochemistry

303 Engel Hall
340 West Campus Dr.
Blacksburg, VA 24061

jalemkul at vt.edu | (540) 231-3129


More information about the gromacs.org_gmx-users mailing list