[gmx-users] performance

Mon Sep 18 15:57:16 CEST 2017

On Thu, Sep 14, 2017 at 1:02 PM, gromacs query <gromacsquery at gmail.com> wrote:
> Hi Szilárd,
>
> Here are my replies:
>
>>> Did you run the "fast" single job on an otherwise empty node? That might
> explain it as, when most of the CPU cores are left empty, modern CPUs
> increase clocks (tubo boost) on the used cores higher than they could with
> all cores busy.
>
> Yes the "fast" single job was on empty node. Sorry I don't get it when you
> say 'modern CPUs increase clocks', you mean the ns/day I get is pseudo in
> that case?

It's called DVFS or Turbo Boost on Intel. Here are some pointers:
https://en.wikipedia.org/wiki/Dynamic_frequency_scaling
https://en.wikipedia.org/wiki/Intel_Turbo_Boost

>>> and if you post an actual log I can certainly give more informed comments
>
> Sure, if its ok can I post it off-mailing list to you?

Please use an online file sharing service of your linking so everyone
has access to the information referred to here.

>>> However, note that if you are sharing a node with others, if their jobs
> are not correctly affinitized, those processes will affect the performance
> of your job.
>
> Yes exactly. In this case I would need to manually set pinoffset but this
> can be but frustrating if other Gromacs users are not binding :)
> Would it be possible to fix this in the default algorithm, though am
> unaware of other issues it might cause?

No, there is no issue on the GROMACS-side to fix. This is an issue
that the jobs scheduler/you as user needs to deal with to avoid the
pitfalls and performance-cliff inherent to node-sharing.

> Also mutidir is not convenient
> sometimes when job crashes in the middle and automatic restart from cpt
> file would be difficult.

Let me answer that separately to emphasize a few technical issues.

Cheers,
--
Szilárd

> -J
>
>
> On Thu, Sep 14, 2017 at 11:26 AM, Szilárd Páll <pall.szilard at gmail.com>
> wrote:
>
>> On Wed, Sep 13, 2017 at 11:14 PM, gromacs query <gromacsquery at gmail.com>
>> wrote:
>> > Hi Szilárd,
>> >
>> > Thanks again. I tried now with -multidir like this:
>> >
>> > mpirun -np 16 gmx_mpi mdrun -s test -ntomp 2 -maxh 0.1 -multidir t1 t2
>> t3 t4
>> >
>> > So this runs 4 jobs on same node so for each job np is = 16/4, and each
>> job
>> > using 2 GPU. I get now quite improved performance and equal performance
>> for
>> > each job (~ 220 ns) though still slightly less than single independent
>> job
>> > (where I get 300 ns). I can live with that but -
>>
>> That is not normal and it is more likely to be a benchmarking
>> discrepancy: you are likely not comparing apples to apples. Did you
>> run the "fast" single job on an otherwise empty node? That might
>> explain it as, when most of the CPU cores are left empty, modern CPUs
>> increase clocks (tubo boost) on the used cores higher than they could
>> with all cores busy.
>>
>> > Surprised: There are maximum 40 cores and 8 GPUs per node and thus my 4
>> > jobs should consume 8 GPUS.
>>
>> Note that even if those are 40 real cores (rather than 20 core with
>> HyperThreading), the current GROMACS release will be unlikely to run
>> efficiently with at least 6-8 cores per GPU. This will likely change
>> with the next release.
>>
>> > So I am bit surprised with the fact the same
>> > node on which my four jobs were running was already occupied with jobs by
>> > some other user, which I think should not happen (may be slurm.config
>> admin
>> > issue?). Either my some jobs should have gone in queue or run on other
>> node
>> > if free.
>>
>> Sounds like a job scheduler issue (you can always check in the log the
>> detected hardware) -- and if you post an actual log I can certainly
>> give more informed comments.
>>
>> > What to do: Importantly though as an individual user I can submit
>> -multidir
>> > job but lets say, which is normally the case, there will be many other
>> > unknown users who submit one or two jobs in that case performance will be
>> > an issue (which is equivalent to my case when I submit many jobs without
>> > -multi/multidir).
>>
>> Not sure I follow: if you always have a number of similar runs to do,
>> submit them together and benefit from not having to manual hardware
>> assignment. Otherwise, if your cluster relies on node sharing, you
>> will have to make sure that you specify correctly the affinity/binding
>> arguments to your job scheduler (or work around it with manual offset
>> calculation). However, note that if you are sharing a node with
>> others, if their jobs are not correctly affinitized, those processes
>> will affect the performance of your job.
>>
>> > I think still they will need -pinoffset. Could you
>> > please suggest what best can be done in such case?
>>
>> See above.
>>
>> Cheers,
>> --
>> Szilárd
>>
>> >
>> > -Jiom
>> >
>> >
>> >
>> >
>> > On Wed, Sep 13, 2017 at 9:15 PM, Szilárd Páll <pall.szilard at gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> First off, have you considered options 2) using multi-sim? That would
>> >> allow you to not have to bother manually set offsets. Can you not
>> >> submit your jobs such that you fill at least a node?
>> >>
>> >> How many threads/cores does you node have? Can you share log files?
>> >>
>> >> Cheers,
>> >> --
>> >> Szilárd
>> >>
>> >>
>> >> On Wed, Sep 13, 2017 at 9:14 PM, gromacs query <gromacsquery at gmail.com>
>> >> wrote:
>> >> > Hi Szilárd,
>> >> >
>> >> > Sorry I was bit quick to say its working with pinoffset. I just
>> submitted
>> >> > four same jobs (2 gpus, 4 nprocs) on the same node with -pin on and
>> >> > different -pinoffset to 0, 5, 10, 15 (numbers should be fine as there
>> are
>> >> > 40 cores on node). Still I don't get same performance (all variably
>> less
>> >> > than 50%) as expected from a single independent job. Now am wondering
>> if
>> >> > its still related to overlap of cores as pin on should lock the cores
>> for
>> >> > the same job.
>> >> >
>> >> > -J
>> >> >
>> >> > On Wed, Sep 13, 2017 at 7:33 PM, gromacs query <
>> gromacsquery at gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi Szilárd,
>> >> >>
>> >> >> Thanks, option 3 was in my mind but I need to figure out now how :)
>> >> >> Manually fixing pinoffset as of now seems working with some quick
>> test.
>> >> >> I think option 1 would require to ask the admin but I can try option
>> 3
>> >> >> myself. As there are other users from different places who may not
>> >> bother
>> >> >> using option 3. I think I would need to ask the admin to force
>> option 1
>> >> but
>> >> >> before that I will try option 3.
>> >> >>
>> >> >> JIom
>> >> >>
>> >> >> On Wed, Sep 13, 2017 at 7:10 PM, Szilárd Páll <
>> pall.szilard at gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >>> J,
>> >> >>>
>> >> >>> You have a few options:
>> >> >>>
>> >> >>> * Use SLURM to assign not only the set of GPUs, but also the correct
>> >> >>> set of CPU cores to each mdrun process. If you do so, mdrun will
>> >> >>> respect the affinity mask it will inherit and your two mdrun jobs
>> >> >>> should be running on the right set of cores. This has the drawback
>> >> >>> that (AFAIK) SLURM/aprun (or srun) will not allow you to bind each
>> >> >>> application thread to a core/hardware thread (which is what mdrun
>> >> >>> does), only a process to a group of cores/hw threads which can
>> >> >>> sometimes lead to performance loss. (You might be able to compensate
>> >> >>> using some OpenMP library environment variables, though.)
>> >> >>>
>> >> >>> * Run multiple jobs with mdrun "-multi"/"-multidir"  (either two per
>> >> >>> node or mulitple across nodes) and benefit from the rank/thread to
>> >> >>> core/hw thread assignment that's supported also across multiple
>> >> >>> simulations part of a multi-run; e.g.:
>> >> >>> mpirun -np 4 gmx mdrun -multi 4 -ntomp N -multidir
>> >> my_input_dir{1,2,3,4}
>> >> >>> will launch 4 ranks and start 4 simulations in each of the four
>> >> >>> directories passed.
>> >> >>>
>> >> >>> * Write a wrapper script around gmx mdrun which will be what you
>> >> >>> launch with SLURM; you can then inspect the node and decide what
>> >> >>> pinoffset value to pass to your mdrun launch command.
>> >> >>>
>> >> >>>
>> >> >>> I hope one of these will deliver the desired results :)
>> >> >>>
>> >> >>> Cheers,
>> >> >>> --
>> >> >>> Szilárd
>> >> >>>
>> >> >>>
>> >> >>> On Wed, Sep 13, 2017 at 7:47 PM, gromacs query <
>> gromacsquery at gmail.com
>> >> >
>> >> >>> wrote:
>> >> >>> > Hi Szilárd,
>> >> >>> >
>> >> >>> > Thanks for your reply. This is useful but now am thinking because
>> the
>> >> >>> slurm
>> >> >>> > launches job in an automated way it is not really in my control to
>> >> >>> choose
>> >> >>> > the node. So following things can happen; say for two mdrun jobs I
>> >> set
>> >> >>> > -pinoffset 0 and -pinoffset 4;
>> >> >>> >
>> >> >>> > - if they are running on the same node this is good
>> >> >>> > - if jobs run on different nodes (partially occupied or free)
>> whether
>> >> >>> these
>> >> >>> > chosen pinoffsets will make sense or not as I don't know what
>> >> pinoffset
>> >> >>> I
>> >> >>> > would need to set
>> >> >>> > - if I have to submit many jobs together and slurm chooses
>> >> >>> different/same
>> >> >>> > node itself then I think it is difficult to define pinoffset.
>> >> >>> >
>> >> >>> > -
>> >> >>> > J
>> >> >>> >
>> >> >>> > On Wed, Sep 13, 2017 at 6:14 PM, Szilárd Páll <
>> >> pall.szilard at gmail.com>
>> >> >>> > wrote:
>> >> >>> >
>> >> >>> >> My guess is that the two jobs are using the same cores -- either
>> all
>> >> >>> >> cores/threads or only half of them, but the same set.
>> >> >>> >>
>> >> >>> >> You should use -pinoffset; see:
>> >> >>> >>
>> >> >>> >> - Docs and example:
>> >> >>> >> http://manual.gromacs.org/documentation/2016/user-guide/
>> >> >>> >> mdrun-performance.html
>> >> >>> >>
>> >> >>> >> - More explanation on the thread pinning behavior on the old
>> >> website:
>> >> >>> >> http://www.gromacs.org/Documentation/Acceleration_
>> >> >>> >> and_parallelization#Pinning_threads_to_physical_cores
>> >> >>> >>
>> >> >>> >> Cheers,
>> >> >>> >> --
>> >> >>> >> Szilárd
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> On Wed, Sep 13, 2017 at 6:35 PM, gromacs query <
>> >> gromacsquery at gmail.com
>> >> >>> >
>> >> >>> >> wrote:
>> >> >>> >> > Sorry forgot to add; we thought the two jobs are using same GPU
>> >> ids
>> >> >>> but
>> >> >>> >> > cuda visible devices show both jobs are using different ids
>> (0,1
>> >> and
>> >> >>> 2,3)
>> >> >>> >> >
>> >> >>> >> > -
>> >> >>> >> > J
>> >> >>> >> >
>> >> >>> >> > On Wed, Sep 13, 2017 at 5:33 PM, gromacs query <
>> >> >>> gromacsquery at gmail.com>
>> >> >>> >> > wrote:
>> >> >>> >> >
>> >> >>> >> >> Hi All,
>> >> >>> >> >>
>> >> >>> >> >> I have some issues with gromacs performance. There are many
>> nodes
>> >> >>> and
>> >> >>> >> each
>> >> >>> >> >> node has number of gpus and the batch process is controlled by
>> >> >>> slurm.
>> >> >>> >> >> Although I get good performance with some settings of number
>> of
>> >> >>> gpus and
>> >> >>> >> >> nprocs but when I submit same job twice on the same node then
>> the
>> >> >>> >> >> performance is reduced drastically. e.g
>> >> >>> >> >>
>> >> >>> >> >> For 2 GPUs I get 300 ns per day when there is no other job
>> >> running
>> >> >>> on
>> >> >>> >> the
>> >> >>> >> >> node. When I submit same job twice on the same node & at the
>> same
>> >> >>> time,
>> >> >>> >> I
>> >> >>> >> >> get only 17 ns/day for both the jobs. I am using this:
>> >> >>> >> >>
>> >> >>> >> >> mpirun -np 4 gmx_mpi mdrun -deffnm test -ntomp 2 -maxh 0.12
>> >> >>> >> >>
>> >> >>> >> >> Any suggestions highly appreciated.
>> >> >>> >> >>
>> >> >>> >> >> Thanks
>> >> >>> >> >>
>> >> >>> >> >> Jiom
>> >> >>> >> >>
>> >> >>> >> > --
>> >> >>> >> > Gromacs Users mailing list
>> >> >>> >> >
>> >> >>> >> > * Please search the archive at http://www.gromacs.org/
>> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >> >>> >> >
>> >> >>> >> > * Can't post? Read http://www.gromacs.org/
>> Support/Mailing_Lists
>> >> >>> >> >
>> >> >>> >> > * For (un)subscribe requests visit
>> >> >>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
>> >> gmx-users
>> >> >>> or
>> >> >>> >> send a mail to gmx-users-request at gromacs.org.
>> >> >>> >> --
>> >> >>> >> Gromacs Users mailing list
>> >> >>> >>
>> >> >>> >> * Please search the archive at http://www.gromacs.org/
>> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >> >>> >>
>> >> >>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >>> >>
>> >> >>> >> * For (un)subscribe requests visit
>> >> >>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
>> gmx-users
>> >> or
>> >> >>> >> send a mail to gmx-users-request at gromacs.org.
>> >> >>> > --
>> >> >>> > Gromacs Users mailing list
>> >> >>> >
>> >> >>> > * Please search the archive at http://www.gromacs.org/Support
>> >> >>> /Mailing_Lists/GMX-Users_List before posting!
>> >> >>> >
>> >> >>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >>> >
>> >> >>> > * For (un)subscribe requests visit
>> >> >>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
>> gmx-users
>> >> or
>> >> >>> send a mail to gmx-users-request at gromacs.org.
>> >> >>> --
>> >> >>> Gromacs Users mailing list
>> >> >>>
>> >> >>> * Please search the archive at http://www.gromacs.org/Support
>> >> >>> /Mailing_Lists/GMX-Users_List before posting!
>> >> >>>
>> >> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >>>
>> >> >>> * For (un)subscribe requests visit
>> >> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> or
>> >> >>> send a mail to gmx-users-request at gromacs.org.
>> >> >>>
>> >> >>
>> >> >>
>> >> > --
>> >> > Gromacs Users mailing list
>> >> >
>> >> > * Please search the archive at http://www.gromacs.org/
>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >> >
>> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >
>> >> > * For (un)subscribe requests visit
>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >> send a mail to gmx-users-request at gromacs.org.
>> >> --
>> >> Gromacs Users mailing list
>> >>
>> >> * Please search the archive at http://www.gromacs.org/
>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >>
>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>
>> >> * For (un)subscribe requests visit
>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >> send a mail to gmx-users-request at gromacs.org.
>> >>
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at http://www.gromacs.org/
>> Support/Mailing_Lists/GMX-Users_List before posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/
>> Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.