[gmx-users] strange GPU load distribution

Fri Apr 27 21:52:20 CEST 2018

Group cutoff scheme can never run on a gpu, so none of that should matter.
Use ps and find out what the command lines were.

Mark

On Fri, Apr 27, 2018, 21:37 Alex <nedomacho at gmail.com> wrote:

> Update: we're basically removing commands one by one from the script that
> submits the jobs causing the issue. The culprit is both EM and the MD run:
> and GPUs are being affected _before_ MD starts loading the CPU, i.e. this
> is the initial setting up of the EM run -- CPU load is near zero,
> nvidia-smi reports the mess. I wonder if this is in any way related to that
> timing test we were failing a while back.
> mdrun call and mdp below, though I suspect they have nothing to do with
> what is happening. Any help will be very highly appreciated.
>
> Alex
>
> ***
>
> gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm
>
> mdp:
>
> ; Run control
> integrator               = md-vv       ; Velocity Verlet
> tinit                    = 0
> dt                       = 0.002
> nsteps                   = 500000    ; 1 ns
> nstcomm                  = 100
> ; Output control
> nstxout                  = 50000
> nstvout                  = 50000
> nstfout                  = 0
> nstlog                   = 50000
> nstenergy                = 50000
> nstxout-compressed       = 0
> ; Neighborsearching and short-range nonbonded interactions
> cutoff-scheme            = group
> nstlist                  = 10
> ns_type                  = grid
> pbc                      = xyz
> rlist                    = 1.4
> ; Electrostatics
> coulombtype              = cutoff
> rcoulomb                 = 1.4
> ; van der Waals
> vdwtype                  = user
> vdw-modifier             = none
> rvdw                     = 1.4
> ; Apply long range dispersion corrections for Energy and Pressure
> DispCorr                  = EnerPres
> ; Spacing for the PME/PPPM FFT grid
> fourierspacing           = 0.12
> ; EWALD/PME/PPPM parameters
> pme_order                = 6
> ewald_rtol               = 1e-06
> epsilon_surface          = 0
> ; Temperature coupling
> Tcoupl                   = nose-hoover
> tc_grps                  = system
> tau_t                    = 1.0
> ref_t                    = some_temperature
> ; Pressure coupling is off for NVT
> Pcoupl                   = No
> tau_p                    = 0.5
> compressibility          = 4.5e-05
> ref_p                    = 1.0
> ; options for bonds
> constraints              = all-bonds
> constraint_algorithm     = lincs
>
>
>
>
>
>
> On Fri, Apr 27, 2018 at 1:14 PM, Alex <nedomacho at gmail.com> wrote:
>
> > As I said, only two users, and nvidia-smi shows the process name. We're
> > investigating and it does appear that it is EM that uses cutoff
> > electrostatics and as a result the user did not bother with -pme cpu in
> the
> > mdrun call. What would be the correct way to enforce cpu-only mdrun when
> > coulombtype = cutoff?
> >
> > Thanks,
> >
> > Alex
> >
> > On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham <mark.j.abraham at gmail.com
> >
> > wrote:
> >
> >> No.
> >>
> >> Look at the processes that are running, e.g. with top or ps. Either old
> >> simulations or another user is running.
> >>
> >> Mark
> >>
> >> On Fri, Apr 27, 2018, 20:33 Alex <nedomacho at gmail.com> wrote:
> >>
> >> > Strange. There are only two people using this machine, myself being
> one
> >> of
> >> > them, and the other person specifically forces -nb cpu -pme cpu in his
> >> > calls to mdrun. Are any other GMX utilities (e.g. insert-molecules,
> >> grompp,
> >> > or energy) trying to use GPUs?
> >> >
> >> > Thanks,
> >> >
> >> > Alex
> >> >
> >> > On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll <pall.szilard at gmail.com
> >
> >> > wrote:
> >> >
> >> > > The second column is PIDs so there is a whole lot more going on
> there
> >> > than
> >> > > just a single simulation, single rank using two GPUs. That would be
> >> one
> >> > PID
> >> > > and two entries for the two GPUs. Are you sure you're not running
> >> other
> >> > > processes?
> >> > >
> >> > > --
> >> > > Szilárd
> >> > >
> >> > > On Thu, Apr 26, 2018 at 5:52 AM, Alex <nedomacho at gmail.com> wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24
> >> > -ntmpi 4
> >> > > > -npme 1 -pme gpu -nb gpu -gputasks 1122
> >> > > >
> >> > > > Once in a while the simulation slows down and nvidia-smi reports
> >> > > something
> >> > > > like this:
> >> > > >
> >> > > > |    1     12981      C gmx
> >> > > > 175MiB |
> >> > > > |    2     12981      C gmx
> >> > > > 217MiB |
> >> > > > |    2     13083      C gmx
> >> > > > 161MiB |
> >> > > > |    2     13086      C gmx
> >> > > > 159MiB |
> >> > > > |    2     13089      C gmx
> >> > > > 139MiB |
> >> > > > |    2     13093      C gmx
> >> > > > 163MiB |
> >> > > > |    2     13096      C gmx
> >> > > > 11MiB |
> >> > > > |    2     13099      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13102      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13106      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13109      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13112      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13115      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13119      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13122      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13125      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13128      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13131      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13134      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13138      C gmx
> >> > > > 8MiB |
> >> > > > |    2     13141      C gmx
> >> > > > 8MiB |
> >> > > > +-----------------------------------------------------------
> >> > > > ------------------+
> >> > > >
> >> > > > Then goes back to the expected load. Is this normal?
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Alex
> >> > > >
> >> > > > --
> >> > > > Gromacs Users mailing list
> >> > > >
> >> > > > * Please search the archive at http://www.gromacs.org/Support
> >> > > > /Mailing_Lists/GMX-Users_List before posting!
> >> > > >
> >> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> > > >
> >> > > > * For (un)subscribe requests visit
> >> > > >
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> >> or
> >> > > > send a mail to gmx-users-request at gromacs.org.
> >> > > --
> >> > > Gromacs Users mailing list
> >> > >
> >> > > * Please search the archive at http://www.gromacs.org/
> >> > > Support/Mailing_Lists/GMX-Users_List before posting!
> >> > >
> >> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> > >
> >> > > * For (un)subscribe requests visit
> >> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> >> > > send a mail to gmx-users-request at gromacs.org.
> >> > --
> >> > Gromacs Users mailing list
> >> >
> >> > * Please search the archive at
> >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> > posting!
> >> >
> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >
> >> > * For (un)subscribe requests visit
> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> > send a mail to gmx-users-request at gromacs.org.
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at http://www.gromacs.org/Support
> >> /Mailing_Lists/GMX-Users_List before posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
> >
> >
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.