[gmx-users] performance

Thu Sep 14 13:02:35 CEST 2017

Hi Szilárd,

Here are my replies:

>> Did you run the "fast" single job on an otherwise empty node? That might
explain it as, when most of the CPU cores are left empty, modern CPUs
increase clocks (tubo boost) on the used cores higher than they could with
all cores busy.

Yes the "fast" single job was on empty node. Sorry I don't get it when you
say 'modern CPUs increase clocks', you mean the ns/day I get is pseudo in
that case?

>> and if you post an actual log I can certainly give more informed comments

Sure, if its ok can I post it off-mailing list to you?

>> However, note that if you are sharing a node with others, if their jobs
are not correctly affinitized, those processes will affect the performance
of your job.

Yes exactly. In this case I would need to manually set pinoffset but this
can be but frustrating if other Gromacs users are not binding :)
Would it be possible to fix this in the default algorithm, though am
unaware of other issues it might cause? Also mutidir is not convenient
sometimes when job crashes in the middle and automatic restart from cpt
file would be difficult.

-J

On Thu, Sep 14, 2017 at 11:26 AM, Szilárd Páll <pall.szilard at gmail.com>
wrote:

> On Wed, Sep 13, 2017 at 11:14 PM, gromacs query <gromacsquery at gmail.com>
> wrote:
> > Hi Szilárd,
> >
> > Thanks again. I tried now with -multidir like this:
> >
> > mpirun -np 16 gmx_mpi mdrun -s test -ntomp 2 -maxh 0.1 -multidir t1 t2
> t3 t4
> >
> > So this runs 4 jobs on same node so for each job np is = 16/4, and each
> job
> > using 2 GPU. I get now quite improved performance and equal performance
> for
> > each job (~ 220 ns) though still slightly less than single independent
> job
> > (where I get 300 ns). I can live with that but -
>
> That is not normal and it is more likely to be a benchmarking
> discrepancy: you are likely not comparing apples to apples. Did you
> run the "fast" single job on an otherwise empty node? That might
> explain it as, when most of the CPU cores are left empty, modern CPUs
> increase clocks (tubo boost) on the used cores higher than they could
> with all cores busy.
>
> > Surprised: There are maximum 40 cores and 8 GPUs per node and thus my 4
> > jobs should consume 8 GPUS.
>
> Note that even if those are 40 real cores (rather than 20 core with
> HyperThreading), the current GROMACS release will be unlikely to run
> efficiently with at least 6-8 cores per GPU. This will likely change
> with the next release.
>
> > So I am bit surprised with the fact the same
> > node on which my four jobs were running was already occupied with jobs by
> > some other user, which I think should not happen (may be slurm.config
> admin
> > issue?). Either my some jobs should have gone in queue or run on other
> node
> > if free.
>
> Sounds like a job scheduler issue (you can always check in the log the
> detected hardware) -- and if you post an actual log I can certainly
> give more informed comments.
>
> > What to do: Importantly though as an individual user I can submit
> -multidir
> > job but lets say, which is normally the case, there will be many other
> > unknown users who submit one or two jobs in that case performance will be
> > an issue (which is equivalent to my case when I submit many jobs without
> > -multi/multidir).
>
> Not sure I follow: if you always have a number of similar runs to do,
> submit them together and benefit from not having to manual hardware
> assignment. Otherwise, if your cluster relies on node sharing, you
> will have to make sure that you specify correctly the affinity/binding
> arguments to your job scheduler (or work around it with manual offset
> calculation). However, note that if you are sharing a node with
> others, if their jobs are not correctly affinitized, those processes
> will affect the performance of your job.
>
> > I think still they will need -pinoffset. Could you
> > please suggest what best can be done in such case?
>
> See above.
>
> Cheers,
> --
> Szilárd
>
> >
> > -Jiom
> >
> >
> >
> >
> > On Wed, Sep 13, 2017 at 9:15 PM, Szilárd Páll <pall.szilard at gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> First off, have you considered options 2) using multi-sim? That would
> >> allow you to not have to bother manually set offsets. Can you not
> >> submit your jobs such that you fill at least a node?
> >>
> >> How many threads/cores does you node have? Can you share log files?
> >>
> >> Cheers,
> >> --
> >> Szilárd
> >>
> >>
> >> On Wed, Sep 13, 2017 at 9:14 PM, gromacs query <gromacsquery at gmail.com>
> >> wrote:
> >> > Hi Szilárd,
> >> >
> >> > Sorry I was bit quick to say its working with pinoffset. I just
> submitted
> >> > four same jobs (2 gpus, 4 nprocs) on the same node with -pin on and
> >> > different -pinoffset to 0, 5, 10, 15 (numbers should be fine as there
> are
> >> > 40 cores on node). Still I don't get same performance (all variably
> less
> >> > than 50%) as expected from a single independent job. Now am wondering
> if
> >> > its still related to overlap of cores as pin on should lock the cores
> for
> >> > the same job.
> >> >
> >> > -J
> >> >
> >> > On Wed, Sep 13, 2017 at 7:33 PM, gromacs query <
> gromacsquery at gmail.com>
> >> > wrote:
> >> >
> >> >> Hi Szilárd,
> >> >>
> >> >> Thanks, option 3 was in my mind but I need to figure out now how :)
> >> >> Manually fixing pinoffset as of now seems working with some quick
> test.
> >> >> I think option 1 would require to ask the admin but I can try option
> 3
> >> >> myself. As there are other users from different places who may not
> >> bother
> >> >> using option 3. I think I would need to ask the admin to force
> option 1
> >> but
> >> >> before that I will try option 3.
> >> >>
> >> >> JIom
> >> >>
> >> >> On Wed, Sep 13, 2017 at 7:10 PM, Szilárd Páll <
> pall.szilard at gmail.com>
> >> >> wrote:
> >> >>
> >> >>> J,
> >> >>>
> >> >>> You have a few options:
> >> >>>
> >> >>> * Use SLURM to assign not only the set of GPUs, but also the correct
> >> >>> set of CPU cores to each mdrun process. If you do so, mdrun will
> >> >>> respect the affinity mask it will inherit and your two mdrun jobs
> >> >>> should be running on the right set of cores. This has the drawback
> >> >>> that (AFAIK) SLURM/aprun (or srun) will not allow you to bind each
> >> >>> application thread to a core/hardware thread (which is what mdrun
> >> >>> does), only a process to a group of cores/hw threads which can
> >> >>> sometimes lead to performance loss. (You might be able to compensate
> >> >>> using some OpenMP library environment variables, though.)
> >> >>>
> >> >>> * Run multiple jobs with mdrun "-multi"/"-multidir"  (either two per
> >> >>> node or mulitple across nodes) and benefit from the rank/thread to
> >> >>> core/hw thread assignment that's supported also across multiple
> >> >>> simulations part of a multi-run; e.g.:
> >> >>> mpirun -np 4 gmx mdrun -multi 4 -ntomp N -multidir
> >> my_input_dir{1,2,3,4}
> >> >>> will launch 4 ranks and start 4 simulations in each of the four
> >> >>> directories passed.
> >> >>>
> >> >>> * Write a wrapper script around gmx mdrun which will be what you
> >> >>> launch with SLURM; you can then inspect the node and decide what
> >> >>> pinoffset value to pass to your mdrun launch command.
> >> >>>
> >> >>>
> >> >>> I hope one of these will deliver the desired results :)
> >> >>>
> >> >>> Cheers,
> >> >>> --
> >> >>> Szilárd
> >> >>>
> >> >>>
> >> >>> On Wed, Sep 13, 2017 at 7:47 PM, gromacs query <
> gromacsquery at gmail.com
> >> >
> >> >>> wrote:
> >> >>> > Hi Szilárd,
> >> >>> >
> >> >>> > Thanks for your reply. This is useful but now am thinking because
> the
> >> >>> slurm
> >> >>> > launches job in an automated way it is not really in my control to
> >> >>> choose
> >> >>> > the node. So following things can happen; say for two mdrun jobs I
> >> set
> >> >>> > -pinoffset 0 and -pinoffset 4;
> >> >>> >
> >> >>> > - if they are running on the same node this is good
> >> >>> > - if jobs run on different nodes (partially occupied or free)
> whether
> >> >>> these
> >> >>> > chosen pinoffsets will make sense or not as I don't know what
> >> pinoffset
> >> >>> I
> >> >>> > would need to set
> >> >>> > - if I have to submit many jobs together and slurm chooses
> >> >>> different/same
> >> >>> > node itself then I think it is difficult to define pinoffset.
> >> >>> >
> >> >>> > -
> >> >>> > J
> >> >>> >
> >> >>> > On Wed, Sep 13, 2017 at 6:14 PM, Szilárd Páll <
> >> pall.szilard at gmail.com>
> >> >>> > wrote:
> >> >>> >
> >> >>> >> My guess is that the two jobs are using the same cores -- either
> all
> >> >>> >> cores/threads or only half of them, but the same set.
> >> >>> >>
> >> >>> >> You should use -pinoffset; see:
> >> >>> >>
> >> >>> >> - Docs and example:
> >> >>> >> http://manual.gromacs.org/documentation/2016/user-guide/
> >> >>> >> mdrun-performance.html
> >> >>> >>
> >> >>> >> - More explanation on the thread pinning behavior on the old
> >> website:
> >> >>> >> http://www.gromacs.org/Documentation/Acceleration_
> >> >>> >> and_parallelization#Pinning_threads_to_physical_cores
> >> >>> >>
> >> >>> >> Cheers,
> >> >>> >> --
> >> >>> >> Szilárd
> >> >>> >>
> >> >>> >>
> >> >>> >> On Wed, Sep 13, 2017 at 6:35 PM, gromacs query <
> >> gromacsquery at gmail.com
> >> >>> >
> >> >>> >> wrote:
> >> >>> >> > Sorry forgot to add; we thought the two jobs are using same GPU
> >> ids
> >> >>> but
> >> >>> >> > cuda visible devices show both jobs are using different ids
> (0,1
> >> and
> >> >>> 2,3)
> >> >>> >> >
> >> >>> >> > -
> >> >>> >> > J
> >> >>> >> >
> >> >>> >> > On Wed, Sep 13, 2017 at 5:33 PM, gromacs query <
> >> >>> gromacsquery at gmail.com>
> >> >>> >> > wrote:
> >> >>> >> >
> >> >>> >> >> Hi All,
> >> >>> >> >>
> >> >>> >> >> I have some issues with gromacs performance. There are many
> nodes
> >> >>> and
> >> >>> >> each
> >> >>> >> >> node has number of gpus and the batch process is controlled by
> >> >>> slurm.
> >> >>> >> >> Although I get good performance with some settings of number
> of
> >> >>> gpus and
> >> >>> >> >> nprocs but when I submit same job twice on the same node then
> the
> >> >>> >> >> performance is reduced drastically. e.g
> >> >>> >> >>
> >> >>> >> >> For 2 GPUs I get 300 ns per day when there is no other job
> >> running
> >> >>> on
> >> >>> >> the
> >> >>> >> >> node. When I submit same job twice on the same node & at the
> same
> >> >>> time,
> >> >>> >> I
> >> >>> >> >> get only 17 ns/day for both the jobs. I am using this:
> >> >>> >> >>
> >> >>> >> >> mpirun -np 4 gmx_mpi mdrun -deffnm test -ntomp 2 -maxh 0.12
> >> >>> >> >>
> >> >>> >> >> Any suggestions highly appreciated.
> >> >>> >> >>
> >> >>> >> >> Thanks
> >> >>> >> >>
> >> >>> >> >> Jiom
> >> >>> >> >>
> >> >>> >> > --
> >> >>> >> > Gromacs Users mailing list
> >> >>> >> >
> >> >>> >> > * Please search the archive at http://www.gromacs.org/
> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting!
> >> >>> >> >
> >> >>> >> > * Can't post? Read http://www.gromacs.org/
> Support/Mailing_Lists
> >> >>> >> >
> >> >>> >> > * For (un)subscribe requests visit
> >> >>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
> >> gmx-users
> >> >>> or
> >> >>> >> send a mail to gmx-users-request at gromacs.org.
> >> >>> >> --
> >> >>> >> Gromacs Users mailing list
> >> >>> >>
> >> >>> >> * Please search the archive at http://www.gromacs.org/
> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting!
> >> >>> >>
> >> >>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >>> >>
> >> >>> >> * For (un)subscribe requests visit
> >> >>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
> gmx-users
> >> or
> >> >>> >> send a mail to gmx-users-request at gromacs.org.
> >> >>> > --
> >> >>> > Gromacs Users mailing list
> >> >>> >
> >> >>> > * Please search the archive at http://www.gromacs.org/Support
> >> >>> /Mailing_Lists/GMX-Users_List before posting!
> >> >>> >
> >> >>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >>> >
> >> >>> > * For (un)subscribe requests visit
> >> >>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
> gmx-users
> >> or
> >> >>> send a mail to gmx-users-request at gromacs.org.
> >> >>> --
> >> >>> Gromacs Users mailing list
> >> >>>
> >> >>> * Please search the archive at http://www.gromacs.org/Support
> >> >>> /Mailing_Lists/GMX-Users_List before posting!
> >> >>>
> >> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >>>
> >> >>> * For (un)subscribe requests visit
> >> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> >> >>> send a mail to gmx-users-request at gromacs.org.
> >> >>>
> >> >>
> >> >>
> >> > --
> >> > Gromacs Users mailing list
> >> >
> >> > * Please search the archive at http://www.gromacs.org/
> >> Support/Mailing_Lists/GMX-Users_List before posting!
> >> >
> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >
> >> > * For (un)subscribe requests visit
> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at http://www.gromacs.org/
> >> Support/Mailing_Lists/GMX-Users_List before posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
> >>
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>