[gmx-users] strange GPU load distribution
Alex
nedomacho at gmail.com
Sun May 6 23:52:00 CEST 2018
Unfortunately, we're still bogged down when the EM runs (example below)
start -- CPU usage by these jobs is initially low, while their PIDs show
up in nvidia-smi. After about a minute all goes back to normal. Because
the user is doing it frequently (scripted), everything is slowed down by
a large factor. Interestingly, we have another user utilizing a GPU with
another MD package (LAMMPS) and that GPU is never touched by these EM jobs.
Any ideas will be greatly appreciated.
Thanks,
Alex
> PID TTY STAT TIME COMMAND
>
> 60432 pts/8 Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb
> cpu -pme cpu -deffnm em_steep
>
>
>
> On 4/27/2018 2:16 PM, Mark Abraham wrote:
>> Hi,
>>
>> What you think was run isn't nearly as useful when troubleshooting as
>> asking the kernel what is actually running.
>>
>> Mark
>>
>>
>> On Fri, Apr 27, 2018, 21:59 Alex<nedomacho at gmail.com> wrote:
>>
>>> Mark, I copied the exact command line from the script, right above the
>>> mdp file. It is literally how the script calls mdrun in this case:
>>>
>>> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>>>
>>>
>>> On 4/27/2018 1:52 PM, Mark Abraham wrote:
>>>> Group cutoff scheme can never run on a gpu, so none of that should
>>> matter.
>>>> Use ps and find out what the command lines were.
>>>>
>>>> Mark
>>>>
>>>> On Fri, Apr 27, 2018, 21:37 Alex<nedomacho at gmail.com> wrote:
>>>>
>>>>> Update: we're basically removing commands one by one from the script
>>> that
>>>>> submits the jobs causing the issue. The culprit is both EM and the MD
>>> run:
>>>>> and GPUs are being affected _before_ MD starts loading the CPU, i.e.
>>> this
>>>>> is the initial setting up of the EM run -- CPU load is near zero,
>>>>> nvidia-smi reports the mess. I wonder if this is in any way related to
>>> that
>>>>> timing test we were failing a while back.
>>>>> mdrun call and mdp below, though I suspect they have nothing to do with
>>>>> what is happening. Any help will be very highly appreciated.
>>>>>
>>>>> Alex
>>>>>
>>>>> ***
>>>>>
>>>>> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>>>>>
>>>>> mdp:
>>>>>
>>>>> ; Run control
>>>>> integrator = md-vv ; Velocity Verlet
>>>>> tinit = 0
>>>>> dt = 0.002
>>>>> nsteps = 500000 ; 1 ns
>>>>> nstcomm = 100
>>>>> ; Output control
>>>>> nstxout = 50000
>>>>> nstvout = 50000
>>>>> nstfout = 0
>>>>> nstlog = 50000
>>>>> nstenergy = 50000
>>>>> nstxout-compressed = 0
>>>>> ; Neighborsearching and short-range nonbonded interactions
>>>>> cutoff-scheme = group
>>>>> nstlist = 10
>>>>> ns_type = grid
>>>>> pbc = xyz
>>>>> rlist = 1.4
>>>>> ; Electrostatics
>>>>> coulombtype = cutoff
>>>>> rcoulomb = 1.4
>>>>> ; van der Waals
>>>>> vdwtype = user
>>>>> vdw-modifier = none
>>>>> rvdw = 1.4
>>>>> ; Apply long range dispersion corrections for Energy and Pressure
>>>>> DispCorr = EnerPres
>>>>> ; Spacing for the PME/PPPM FFT grid
>>>>> fourierspacing = 0.12
>>>>> ; EWALD/PME/PPPM parameters
>>>>> pme_order = 6
>>>>> ewald_rtol = 1e-06
>>>>> epsilon_surface = 0
>>>>> ; Temperature coupling
>>>>> Tcoupl = nose-hoover
>>>>> tc_grps = system
>>>>> tau_t = 1.0
>>>>> ref_t = some_temperature
>>>>> ; Pressure coupling is off for NVT
>>>>> Pcoupl = No
>>>>> tau_p = 0.5
>>>>> compressibility = 4.5e-05
>>>>> ref_p = 1.0
>>>>> ; options for bonds
>>>>> constraints = all-bonds
>>>>> constraint_algorithm = lincs
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 27, 2018 at 1:14 PM, Alex<nedomacho at gmail.com> wrote:
>>>>>
>>>>>> As I said, only two users, and nvidia-smi shows the process name. We're
>>>>>> investigating and it does appear that it is EM that uses cutoff
>>>>>> electrostatics and as a result the user did not bother with -pme cpu in
>>>>> the
>>>>>> mdrun call. What would be the correct way to enforce cpu-only mdrun
>>> when
>>>>>> coulombtype = cutoff?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>> On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham <
>>> mark.j.abraham at gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>> No.
>>>>>>>
>>>>>>> Look at the processes that are running, e.g. with top or ps. Either
>>> old
>>>>>>> simulations or another user is running.
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>> On Fri, Apr 27, 2018, 20:33 Alex<nedomacho at gmail.com> wrote:
>>>>>>>
>>>>>>>> Strange. There are only two people using this machine, myself being
>>>>> one
>>>>>>> of
>>>>>>>> them, and the other person specifically forces -nb cpu -pme cpu in
>>> his
>>>>>>>> calls to mdrun. Are any other GMX utilities (e.g. insert-molecules,
>>>>>>> grompp,
>>>>>>>> or energy) trying to use GPUs?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>> On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll <
>>> pall.szilard at gmail.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The second column is PIDs so there is a whole lot more going on
>>>>> there
>>>>>>>> than
>>>>>>>>> just a single simulation, single rank using two GPUs. That would be
>>>>>>> one
>>>>>>>> PID
>>>>>>>>> and two entries for the two GPUs. Are you sure you're not running
>>>>>>> other
>>>>>>>>> processes?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Szilárd
>>>>>>>>>
>>>>>>>>> On Thu, Apr 26, 2018 at 5:52 AM, Alex<nedomacho at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24
>>>>>>>> -ntmpi 4
>>>>>>>>>> -npme 1 -pme gpu -nb gpu -gputasks 1122
>>>>>>>>>>
>>>>>>>>>> Once in a while the simulation slows down and nvidia-smi reports
>>>>>>>>> something
>>>>>>>>>> like this:
>>>>>>>>>>
>>>>>>>>>> | 1 12981 C gmx
>>>>>>>>>> 175MiB |
>>>>>>>>>> | 2 12981 C gmx
>>>>>>>>>> 217MiB |
>>>>>>>>>> | 2 13083 C gmx
>>>>>>>>>> 161MiB |
>>>>>>>>>> | 2 13086 C gmx
>>>>>>>>>> 159MiB |
>>>>>>>>>> | 2 13089 C gmx
>>>>>>>>>> 139MiB |
>>>>>>>>>> | 2 13093 C gmx
>>>>>>>>>> 163MiB |
>>>>>>>>>> | 2 13096 C gmx
>>>>>>>>>> 11MiB |
>>>>>>>>>> | 2 13099 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13102 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13106 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13109 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13112 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13115 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13119 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13122 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13125 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13128 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13131 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13134 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13138 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> | 2 13141 C gmx
>>>>>>>>>> 8MiB |
>>>>>>>>>> +-----------------------------------------------------------
>>>>>>>>>> ------------------+
>>>>>>>>>>
>>>>>>>>>> Then goes back to the expected load. Is this normal?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>
>>>>>>>>>> * Please search the archive athttp://www.gromacs.org/Support
>>>>>>>>>> /Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>>>
>>>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>
>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>>
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>> or
>>>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>>>>> --
>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>
>>>>>>>>> * Please search the archive athttp://www.gromacs.org/
>>>>>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>>
>>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>
>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>> or
>>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at
>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>> posting!
>>>>>>>>
>>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>> or
>>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>>>> --
>>>>>>> Gromacs Users mailing list
>>>>>>>
>>>>>>> * Please search the archive athttp://www.gromacs.org/Support
>>>>>>> /Mailing_Lists/GMX-Users_List before posting!
>>>>>>>
>>>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>>>
>>>>>>> * For (un)subscribe requests visit
>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>>> send a mail togmx-users-request at gromacs.org.
>>>>> --
>>>>> Gromacs Users mailing list
>>>>>
>>>>> * Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>> posting!
>>>>>
>>>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>> send a mail togmx-users-request at gromacs.org.
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail togmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list