[gmx-users] strange GPU load distribution

Fri Apr 27 22:16:15 CEST 2018

Hi,

What you think was run isn't nearly as useful when troubleshooting as
asking the kernel what is actually running.

Mark

On Fri, Apr 27, 2018, 21:59 Alex <nedomacho at gmail.com> wrote:

> Mark, I copied the exact command line from the script, right above the
> mdp file. It is literally how the script calls mdrun in this case:
>
> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>
>
> On 4/27/2018 1:52 PM, Mark Abraham wrote:
> > Group cutoff scheme can never run on a gpu, so none of that should
> matter.
> > Use ps and find out what the command lines were.
> >
> > Mark
> >
> > On Fri, Apr 27, 2018, 21:37 Alex <nedomacho at gmail.com> wrote:
> >
> >> Update: we're basically removing commands one by one from the script
> that
> >> submits the jobs causing the issue. The culprit is both EM and the MD
> run:
> >> and GPUs are being affected _before_ MD starts loading the CPU, i.e.
> this
> >> is the initial setting up of the EM run -- CPU load is near zero,
> >> nvidia-smi reports the mess. I wonder if this is in any way related to
> that
> >> timing test we were failing a while back.
> >> mdrun call and mdp below, though I suspect they have nothing to do with
> >> what is happening. Any help will be very highly appreciated.
> >>
> >> Alex
> >>
> >> ***
> >>
> >> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
> >>
> >> mdp:
> >>
> >> ; Run control
> >> integrator               = md-vv       ; Velocity Verlet
> >> tinit                    = 0
> >> dt                       = 0.002
> >> nsteps                   = 500000    ; 1 ns
> >> nstcomm                  = 100
> >> ; Output control
> >> nstxout                  = 50000
> >> nstvout                  = 50000
> >> nstfout                  = 0
> >> nstlog                   = 50000
> >> nstenergy                = 50000
> >> nstxout-compressed       = 0
> >> ; Neighborsearching and short-range nonbonded interactions
> >> cutoff-scheme            = group
> >> nstlist                  = 10
> >> ns_type                  = grid
> >> pbc                      = xyz
> >> rlist                    = 1.4
> >> ; Electrostatics
> >> coulombtype              = cutoff
> >> rcoulomb                 = 1.4
> >> ; van der Waals
> >> vdwtype                  = user
> >> vdw-modifier             = none
> >> rvdw                     = 1.4
> >> ; Apply long range dispersion corrections for Energy and Pressure
> >> DispCorr                  = EnerPres
> >> ; Spacing for the PME/PPPM FFT grid
> >> fourierspacing           = 0.12
> >> ; EWALD/PME/PPPM parameters
> >> pme_order                = 6
> >> ewald_rtol               = 1e-06
> >> epsilon_surface          = 0
> >> ; Temperature coupling
> >> Tcoupl                   = nose-hoover
> >> tc_grps                  = system
> >> tau_t                    = 1.0
> >> ref_t                    = some_temperature
> >> ; Pressure coupling is off for NVT
> >> Pcoupl                   = No
> >> tau_p                    = 0.5
> >> compressibility          = 4.5e-05
> >> ref_p                    = 1.0
> >> ; options for bonds
> >> constraints              = all-bonds
> >> constraint_algorithm     = lincs
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Apr 27, 2018 at 1:14 PM, Alex <nedomacho at gmail.com> wrote:
> >>
> >>> As I said, only two users, and nvidia-smi shows the process name. We're
> >>> investigating and it does appear that it is EM that uses cutoff
> >>> electrostatics and as a result the user did not bother with -pme cpu in
> >> the
> >>> mdrun call. What would be the correct way to enforce cpu-only mdrun
> when
> >>> coulombtype = cutoff?
> >>>
> >>> Thanks,
> >>>
> >>> Alex
> >>>
> >>> On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham <
> mark.j.abraham at gmail.com
> >>>
> >>> wrote:
> >>>
> >>>> No.
> >>>>
> >>>> Look at the processes that are running, e.g. with top or ps. Either
> old
> >>>> simulations or another user is running.
> >>>>
> >>>> Mark
> >>>>
> >>>> On Fri, Apr 27, 2018, 20:33 Alex <nedomacho at gmail.com> wrote:
> >>>>
> >>>>> Strange. There are only two people using this machine, myself being
> >> one
> >>>> of
> >>>>> them, and the other person specifically forces -nb cpu -pme cpu in
> his
> >>>>> calls to mdrun. Are any other GMX utilities (e.g. insert-molecules,
> >>>> grompp,
> >>>>> or energy) trying to use GPUs?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Alex
> >>>>>
> >>>>> On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll <
> pall.szilard at gmail.com
> >>>>> wrote:
> >>>>>
> >>>>>> The second column is PIDs so there is a whole lot more going on
> >> there
> >>>>> than
> >>>>>> just a single simulation, single rank using two GPUs. That would be
> >>>> one
> >>>>> PID
> >>>>>> and two entries for the two GPUs. Are you sure you're not running
> >>>> other
> >>>>>> processes?
> >>>>>>
> >>>>>> --
> >>>>>> Szilárd
> >>>>>>
> >>>>>> On Thu, Apr 26, 2018 at 5:52 AM, Alex <nedomacho at gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24
> >>>>> -ntmpi 4
> >>>>>>> -npme 1 -pme gpu -nb gpu -gputasks 1122
> >>>>>>>
> >>>>>>> Once in a while the simulation slows down and nvidia-smi reports
> >>>>>> something
> >>>>>>> like this:
> >>>>>>>
> >>>>>>> |    1     12981      C gmx
> >>>>>>> 175MiB |
> >>>>>>> |    2     12981      C gmx
> >>>>>>> 217MiB |
> >>>>>>> |    2     13083      C gmx
> >>>>>>> 161MiB |
> >>>>>>> |    2     13086      C gmx
> >>>>>>> 159MiB |
> >>>>>>> |    2     13089      C gmx
> >>>>>>> 139MiB |
> >>>>>>> |    2     13093      C gmx
> >>>>>>> 163MiB |
> >>>>>>> |    2     13096      C gmx
> >>>>>>> 11MiB |
> >>>>>>> |    2     13099      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13102      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13106      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13109      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13112      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13115      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13119      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13122      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13125      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13128      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13131      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13134      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13138      C gmx
> >>>>>>> 8MiB |
> >>>>>>> |    2     13141      C gmx
> >>>>>>> 8MiB |
> >>>>>>> +-----------------------------------------------------------
> >>>>>>> ------------------+
> >>>>>>>
> >>>>>>> Then goes back to the expected load. Is this normal?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Alex
> >>>>>>>
> >>>>>>> --
> >>>>>>> Gromacs Users mailing list
> >>>>>>>
> >>>>>>> * Please search the archive at http://www.gromacs.org/Support
> >>>>>>> /Mailing_Lists/GMX-Users_List before posting!
> >>>>>>>
> >>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>>>>
> >>>>>>> * For (un)subscribe requests visit
> >>>>>>>
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> >>>> or
> >>>>>>> send a mail to gmx-users-request at gromacs.org.
> >>>>>> --
> >>>>>> Gromacs Users mailing list
> >>>>>>
> >>>>>> * Please search the archive at http://www.gromacs.org/
> >>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
> >>>>>>
> >>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>>>
> >>>>>> * For (un)subscribe requests visit
> >>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> >> or
> >>>>>> send a mail to gmx-users-request at gromacs.org.
> >>>>> --
> >>>>> Gromacs Users mailing list
> >>>>>
> >>>>> * Please search the archive at
> >>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>>>> posting!
> >>>>>
> >>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>>
> >>>>> * For (un)subscribe requests visit
> >>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> >>>>> send a mail to gmx-users-request at gromacs.org.
> >>>> --
> >>>> Gromacs Users mailing list
> >>>>
> >>>> * Please search the archive at http://www.gromacs.org/Support
> >>>> /Mailing_Lists/GMX-Users_List before posting!
> >>>>
> >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>
> >>>> * For (un)subscribe requests visit
> >>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >>>> send a mail to gmx-users-request at gromacs.org.
> >>>
> >>>
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.