[gmx-users] strange GPU load distribution
Justin Lemkul
jalemkul at vt.edu
Mon May 7 00:03:54 CEST 2018
On 5/6/18 5:51 PM, Alex wrote:
> Unfortunately, we're still bogged down when the EM runs (example
> below) start -- CPU usage by these jobs is initially low, while their
> PIDs show up in nvidia-smi. After about a minute all goes back to
> normal. Because the user is doing it frequently (scripted), everything
> is slowed down by a large factor. Interestingly, we have another user
> utilizing a GPU with another MD package (LAMMPS) and that GPU is never
> touched by these EM jobs.
>
> Any ideas will be greatly appreciated.
>
Thinking out loud - a run that explicitly calls for only the CPU to be
used might be trying to detect GPU if mdrun is GPU-enabled. Is that a
possibility, including any latency in detecting that device? Have you
tested to make sure that an mdrun binary that is explicitly disabled
from using GPU (-DGMX_GPU=OFF) doesn't affect the GPU usage when running
the same command?
-Justin
> Thanks,
>
> Alex
>
>
>> PID TTY STAT TIME COMMAND
>>
>> 60432 pts/8 Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1
>> -nb cpu -pme cpu -deffnm em_steep
>>
>>
>>
>
>
>> On 4/27/2018 2:16 PM, Mark Abraham wrote:
>>> Hi,
>>>
>>> What you think was run isn't nearly as useful when troubleshooting as
>>> asking the kernel what is actually running.
>>>
>>> Mark
>>>
>>>
>>> On Fri, Apr 27, 2018, 21:59 Alex<nedomacho at gmail.com> wrote:
>>>
>>>> Mark, I copied the exact command line from the script, right above the
>>>> mdp file. It is literally how the script calls mdrun in this case:
>>>>
>>>> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>>>>
>>>>
>>>> On 4/27/2018 1:52 PM, Mark Abraham wrote:
>>>>> Group cutoff scheme can never run on a gpu, so none of that should
>>>> matter.
>>>>> Use ps and find out what the command lines were.
>>>>>
>>>>> Mark
>>>>>
>>>>> On Fri, Apr 27, 2018, 21:37 Alex<nedomacho at gmail.com> wrote:
>>>>>
>>>>>> Update: we're basically removing commands one by one from the script
>>>> that
>>>>>> submits the jobs causing the issue. The culprit is both EM and
>>>>>> the MD
>>>> run:
>>>>>> and GPUs are being affected _before_ MD starts loading the CPU, i.e.
>>>> this
>>>>>> is the initial setting up of the EM run -- CPU load is near zero,
>>>>>> nvidia-smi reports the mess. I wonder if this is in any way
>>>>>> related to
>>>> that
>>>>>> timing test we were failing a while back.
>>>>>> mdrun call and mdp below, though I suspect they have nothing to
>>>>>> do with
>>>>>> what is happening. Any help will be very highly appreciated.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>> ***
>>>>>>
>>>>>> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>>>>>>
>>>>>> mdp:
>>>>>>
>>>>>> ; Run control
>>>>>> integrator = md-vv ; Velocity Verlet
>>>>>> tinit = 0
>>>>>> dt = 0.002
>>>>>> nsteps = 500000 ; 1 ns
>>>>>> nstcomm = 100
>>>>>> ; Output control
>>>>>> nstxout = 50000
>>>>>> nstvout = 50000
>>>>>> nstfout = 0
>>>>>> nstlog = 50000
>>>>>> nstenergy = 50000
>>>>>> nstxout-compressed = 0
>>>>>> ; Neighborsearching and short-range nonbonded interactions
>>>>>> cutoff-scheme = group
>>>>>> nstlist = 10
>>>>>> ns_type = grid
>>>>>> pbc = xyz
>>>>>> rlist = 1.4
>>>>>> ; Electrostatics
>>>>>> coulombtype = cutoff
>>>>>> rcoulomb = 1.4
>>>>>> ; van der Waals
>>>>>> vdwtype = user
>>>>>> vdw-modifier = none
>>>>>> rvdw = 1.4
>>>>>> ; Apply long range dispersion corrections for Energy and Pressure
>>>>>> DispCorr = EnerPres
>>>>>> ; Spacing for the PME/PPPM FFT grid
>>>>>> fourierspacing = 0.12
>>>>>> ; EWALD/PME/PPPM parameters
>>>>>> pme_order = 6
>>>>>> ewald_rtol = 1e-06
>>>>>> epsilon_surface = 0
>>>>>> ; Temperature coupling
>>>>>> Tcoupl = nose-hoover
>>>>>> tc_grps = system
>>>>>> tau_t = 1.0
>>>>>> ref_t = some_temperature
>>>>>> ; Pressure coupling is off for NVT
>>>>>> Pcoupl = No
>>>>>> tau_p = 0.5
>>>>>> compressibility = 4.5e-05
>>>>>> ref_p = 1.0
>>>>>> ; options for bonds
>>>>>> constraints = all-bonds
>>>>>> constraint_algorithm = lincs
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 27, 2018 at 1:14 PM, Alex<nedomacho at gmail.com> wrote:
>>>>>>
>>>>>>> As I said, only two users, and nvidia-smi shows the process
>>>>>>> name. We're
>>>>>>> investigating and it does appear that it is EM that uses cutoff
>>>>>>> electrostatics and as a result the user did not bother with -pme
>>>>>>> cpu in
>>>>>> the
>>>>>>> mdrun call. What would be the correct way to enforce cpu-only mdrun
>>>> when
>>>>>>> coulombtype = cutoff?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>> On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham <
>>>> mark.j.abraham at gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>> No.
>>>>>>>>
>>>>>>>> Look at the processes that are running, e.g. with top or ps.
>>>>>>>> Either
>>>> old
>>>>>>>> simulations or another user is running.
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> On Fri, Apr 27, 2018, 20:33 Alex<nedomacho at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Strange. There are only two people using this machine, myself
>>>>>>>>> being
>>>>>> one
>>>>>>>> of
>>>>>>>>> them, and the other person specifically forces -nb cpu -pme
>>>>>>>>> cpu in
>>>> his
>>>>>>>>> calls to mdrun. Are any other GMX utilities (e.g.
>>>>>>>>> insert-molecules,
>>>>>>>> grompp,
>>>>>>>>> or energy) trying to use GPUs?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Alex
>>>>>>>>>
>>>>>>>>> On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll <
>>>> pall.szilard at gmail.com
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The second column is PIDs so there is a whole lot more going on
>>>>>> there
>>>>>>>>> than
>>>>>>>>>> just a single simulation, single rank using two GPUs. That
>>>>>>>>>> would be
>>>>>>>> one
>>>>>>>>> PID
>>>>>>>>>> and two entries for the two GPUs. Are you sure you're not
>>>>>>>>>> running
>>>>>>>> other
>>>>>>>>>> processes?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Szilárd
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 26, 2018 at 5:52 AM, Alex<nedomacho at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on
>>>>>>>>>>> -nt 24
>>>>>>>>> -ntmpi 4
>>>>>>>>>>> -npme 1 -pme gpu -nb gpu -gputasks 1122
>>>>>>>>>>>
>>>>>>>>>>> Once in a while the simulation slows down and nvidia-smi
>>>>>>>>>>> reports
>>>>>>>>>> something
>>>>>>>>>>> like this:
>>>>>>>>>>>
>>>>>>>>>>> | 1 12981 C gmx
>>>>>>>>>>> 175MiB |
>>>>>>>>>>> | 2 12981 C gmx
>>>>>>>>>>> 217MiB |
>>>>>>>>>>> | 2 13083 C gmx
>>>>>>>>>>> 161MiB |
>>>>>>>>>>> | 2 13086 C gmx
>>>>>>>>>>> 159MiB |
>>>>>>>>>>> | 2 13089 C gmx
>>>>>>>>>>> 139MiB |
>>>>>>>>>>> | 2 13093 C gmx
>>>>>>>>>>> 163MiB |
>>>>>>>>>>> | 2 13096 C gmx
>>>>>>>>>>> 11MiB |
>>>>>>>>>>> | 2 13099 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13102 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13106 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13109 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13112 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13115 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13119 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13122 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13125 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13128 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13131 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13134 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13138 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> | 2 13141 C gmx
>>>>>>>>>>> 8MiB |
>>>>>>>>>>> +-----------------------------------------------------------
>>>>>>>>>>> ------------------+
>>>>>>>>>>>
>>>>>>>>>>> Then goes back to the expected load. Is this normal?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Alex
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>>
>>>>>>>>>>> * Please search the archive athttp://www.gromacs.org/Support
>>>>>>>>>>> /Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>>>>
>>>>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>>
>>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>>>
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or
>>>>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>>>>>> --
>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>
>>>>>>>>>> * Please search the archive athttp://www.gromacs.org/
>>>>>>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>>>
>>>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>
>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>>>
>>>>>> or
>>>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>>>>> --
>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>
>>>>>>>>> * Please search the archive at
>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
>>>>>>>>> before
>>>>>>>>> posting!
>>>>>>>>>
>>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>
>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>>
>>>> or
>>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive athttp://www.gromacs.org/Support
>>>>>>>> /Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>
>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or
>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>> posting!
>>>>>>
>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>> or
>>>>>> send a mail togmx-users-request at gromacs.org.
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail togmx-users-request at gromacs.org.
>>
>
--
==================================================
Justin A. Lemkul, Ph.D.
Assistant Professor
Virginia Tech Department of Biochemistry
303 Engel Hall
340 West Campus Dr.
Blacksburg, VA 24061
jalemkul at vt.edu | (540) 231-3129
http://www.thelemkullab.com
==================================================
More information about the gromacs.org_gmx-users
mailing list