[gmx-users] strange GPU load distribution

Alex nedomacho at gmail.com
Mon Apr 30 19:11:01 CEST 2018


Hi Mark,

We checked and one example is below.

Thanks,

Alex

PID TTY      STAT   TIME COMMAND

60432 pts/8    Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb 
cpu -pme cpu -deffnm em_steep



On 4/27/2018 2:16 PM, Mark Abraham wrote:
> Hi,
>
> What you think was run isn't nearly as useful when troubleshooting as
> asking the kernel what is actually running.
>
> Mark
>
>
> On Fri, Apr 27, 2018, 21:59 Alex <nedomacho at gmail.com> wrote:
>
>> Mark, I copied the exact command line from the script, right above the
>> mdp file. It is literally how the script calls mdrun in this case:
>>
>> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>>
>>
>> On 4/27/2018 1:52 PM, Mark Abraham wrote:
>>> Group cutoff scheme can never run on a gpu, so none of that should
>> matter.
>>> Use ps and find out what the command lines were.
>>>
>>> Mark
>>>
>>> On Fri, Apr 27, 2018, 21:37 Alex <nedomacho at gmail.com> wrote:
>>>
>>>> Update: we're basically removing commands one by one from the script
>> that
>>>> submits the jobs causing the issue. The culprit is both EM and the MD
>> run:
>>>> and GPUs are being affected _before_ MD starts loading the CPU, i.e.
>> this
>>>> is the initial setting up of the EM run -- CPU load is near zero,
>>>> nvidia-smi reports the mess. I wonder if this is in any way related to
>> that
>>>> timing test we were failing a while back.
>>>> mdrun call and mdp below, though I suspect they have nothing to do with
>>>> what is happening. Any help will be very highly appreciated.
>>>>
>>>> Alex
>>>>
>>>> ***
>>>>
>>>> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>>>>
>>>> mdp:
>>>>
>>>> ; Run control
>>>> integrator               = md-vv       ; Velocity Verlet
>>>> tinit                    = 0
>>>> dt                       = 0.002
>>>> nsteps                   = 500000    ; 1 ns
>>>> nstcomm                  = 100
>>>> ; Output control
>>>> nstxout                  = 50000
>>>> nstvout                  = 50000
>>>> nstfout                  = 0
>>>> nstlog                   = 50000
>>>> nstenergy                = 50000
>>>> nstxout-compressed       = 0
>>>> ; Neighborsearching and short-range nonbonded interactions
>>>> cutoff-scheme            = group
>>>> nstlist                  = 10
>>>> ns_type                  = grid
>>>> pbc                      = xyz
>>>> rlist                    = 1.4
>>>> ; Electrostatics
>>>> coulombtype              = cutoff
>>>> rcoulomb                 = 1.4
>>>> ; van der Waals
>>>> vdwtype                  = user
>>>> vdw-modifier             = none
>>>> rvdw                     = 1.4
>>>> ; Apply long range dispersion corrections for Energy and Pressure
>>>> DispCorr                  = EnerPres
>>>> ; Spacing for the PME/PPPM FFT grid
>>>> fourierspacing           = 0.12
>>>> ; EWALD/PME/PPPM parameters
>>>> pme_order                = 6
>>>> ewald_rtol               = 1e-06
>>>> epsilon_surface          = 0
>>>> ; Temperature coupling
>>>> Tcoupl                   = nose-hoover
>>>> tc_grps                  = system
>>>> tau_t                    = 1.0
>>>> ref_t                    = some_temperature
>>>> ; Pressure coupling is off for NVT
>>>> Pcoupl                   = No
>>>> tau_p                    = 0.5
>>>> compressibility          = 4.5e-05
>>>> ref_p                    = 1.0
>>>> ; options for bonds
>>>> constraints              = all-bonds
>>>> constraint_algorithm     = lincs
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 27, 2018 at 1:14 PM, Alex <nedomacho at gmail.com> wrote:
>>>>
>>>>> As I said, only two users, and nvidia-smi shows the process name. We're
>>>>> investigating and it does appear that it is EM that uses cutoff
>>>>> electrostatics and as a result the user did not bother with -pme cpu in
>>>> the
>>>>> mdrun call. What would be the correct way to enforce cpu-only mdrun
>> when
>>>>> coulombtype = cutoff?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Alex
>>>>>
>>>>> On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham <
>> mark.j.abraham at gmail.com
>>>>> wrote:
>>>>>
>>>>>> No.
>>>>>>
>>>>>> Look at the processes that are running, e.g. with top or ps. Either
>> old
>>>>>> simulations or another user is running.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> On Fri, Apr 27, 2018, 20:33 Alex <nedomacho at gmail.com> wrote:
>>>>>>
>>>>>>> Strange. There are only two people using this machine, myself being
>>>> one
>>>>>> of
>>>>>>> them, and the other person specifically forces -nb cpu -pme cpu in
>> his
>>>>>>> calls to mdrun. Are any other GMX utilities (e.g. insert-molecules,
>>>>>> grompp,
>>>>>>> or energy) trying to use GPUs?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>> On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll <
>> pall.szilard at gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The second column is PIDs so there is a whole lot more going on
>>>> there
>>>>>>> than
>>>>>>>> just a single simulation, single rank using two GPUs. That would be
>>>>>> one
>>>>>>> PID
>>>>>>>> and two entries for the two GPUs. Are you sure you're not running
>>>>>> other
>>>>>>>> processes?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Szilárd
>>>>>>>>
>>>>>>>> On Thu, Apr 26, 2018 at 5:52 AM, Alex <nedomacho at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24
>>>>>>> -ntmpi 4
>>>>>>>>> -npme 1 -pme gpu -nb gpu -gputasks 1122
>>>>>>>>>
>>>>>>>>> Once in a while the simulation slows down and nvidia-smi reports
>>>>>>>> something
>>>>>>>>> like this:
>>>>>>>>>
>>>>>>>>> |    1     12981      C gmx
>>>>>>>>> 175MiB |
>>>>>>>>> |    2     12981      C gmx
>>>>>>>>> 217MiB |
>>>>>>>>> |    2     13083      C gmx
>>>>>>>>> 161MiB |
>>>>>>>>> |    2     13086      C gmx
>>>>>>>>> 159MiB |
>>>>>>>>> |    2     13089      C gmx
>>>>>>>>> 139MiB |
>>>>>>>>> |    2     13093      C gmx
>>>>>>>>> 163MiB |
>>>>>>>>> |    2     13096      C gmx
>>>>>>>>> 11MiB |
>>>>>>>>> |    2     13099      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13102      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13106      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13109      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13112      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13115      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13119      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13122      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13125      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13128      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13131      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13134      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13138      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> |    2     13141      C gmx
>>>>>>>>> 8MiB |
>>>>>>>>> +-----------------------------------------------------------
>>>>>>>>> ------------------+
>>>>>>>>>
>>>>>>>>> Then goes back to the expected load. Is this normal?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Alex
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>
>>>>>>>>> * Please search the archive at http://www.gromacs.org/Support
>>>>>>>>> /Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>>
>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>
>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>> or
>>>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at http://www.gromacs.org/
>>>>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>> or
>>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>> --
>>>>>>> Gromacs Users mailing list
>>>>>>>
>>>>>>> * Please search the archive at
>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>> posting!
>>>>>>>
>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>
>>>>>>> * For (un)subscribe requests visit
>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> or
>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at http://www.gromacs.org/Support
>>>>>> /Mailing_Lists/GMX-Users_List before posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.



More information about the gromacs.org_gmx-users mailing list