[gmx-users] performance

Fri Sep 15 01:06:30 CEST 2017

Hi Szilárd,

Sorry this discussion is going long.
Finally I got one node empty and did some serious tests specially
considering your first point (discrepancies in benchmarking comparing jobs
running on empty node vs occupied node). I tested in both ways.

I ran following cases (single job vs two jobs for 2GPU+4 procs and also for
4GPU+16 procs). Happy to send log files.

Pinoffset results are surprising (4th and 8th test case below) though I get
in log file a WARNING: Requested offset too large for available cores for
the case 8; [should not be an issue as the first job binds the cores]

As suggested defining affinity should help with pinoffset set 'manually'
(in practice with script) but these results are quite variable. Am bit lost
now, what should be the best practice in case nodes are shared among
different users and multidir can be tricky in such case (if other gromacs
users are not using multidir option!).

Sr. no. each job 2GPU; 4 procs performance (ns/day)
1 only one job 345
2 two same jobs together (without pin on) 16.1 and 15.9
3 two same jobs together (without pin on, with -multidir) 178 and 191
4 two same jobs together (pin on, pinoffset at 0 and 5) 160 and 301
each job 4GPU; 16 procs performance (ns/day)
5 only one job 694
6 two same jobs together (without pin on) 340 and 350
7 two same jobs together (without pin on, with -multidir) 346 and 344
8 two same jobs together (pin on, pinoffset at 0 and 17) 204 and 546

On Thu, Sep 14, 2017 at 12:02 PM, gromacs query <gromacsquery at gmail.com>
wrote:

> Hi Szilárd,
>
> Here are my replies:
>
> >> Did you run the "fast" single job on an otherwise empty node? That
> might explain it as, when most of the CPU cores are left empty, modern CPUs
> increase clocks (tubo boost) on the used cores higher than they could with
> all cores busy.
>
> Yes the "fast" single job was on empty node. Sorry I don't get it when you
> say 'modern CPUs increase clocks', you mean the ns/day I get is pseudo in
> that case?
>
> >> and if you post an actual log I can certainly give more informed
> comments
>
> Sure, if its ok can I post it off-mailing list to you?
>
> >> However, note that if you are sharing a node with others, if their jobs
> are not correctly affinitized, those processes will affect the performance
> of your job.
>
> Yes exactly. In this case I would need to manually set pinoffset but this
> can be but frustrating if other Gromacs users are not binding :)
> Would it be possible to fix this in the default algorithm, though am
> unaware of other issues it might cause? Also mutidir is not convenient
> sometimes when job crashes in the middle and automatic restart from cpt
> file would be difficult.
>
> -J
>
>
> On Thu, Sep 14, 2017 at 11:26 AM, Szilárd Páll <pall.szilard at gmail.com>
> wrote:
>
>> On Wed, Sep 13, 2017 at 11:14 PM, gromacs query <gromacsquery at gmail.com>
>> wrote:
>> > Hi Szilárd,
>> >
>> > Thanks again. I tried now with -multidir like this:
>> >
>> > mpirun -np 16 gmx_mpi mdrun -s test -ntomp 2 -maxh 0.1 -multidir t1 t2
>> t3 t4
>> >
>> > So this runs 4 jobs on same node so for each job np is = 16/4, and each
>> job
>> > using 2 GPU. I get now quite improved performance and equal performance
>> for
>> > each job (~ 220 ns) though still slightly less than single independent
>> job
>> > (where I get 300 ns). I can live with that but -
>>
>> That is not normal and it is more likely to be a benchmarking
>> discrepancy: you are likely not comparing apples to apples. Did you
>> run the "fast" single job on an otherwise empty node? That might
>> explain it as, when most of the CPU cores are left empty, modern CPUs
>> increase clocks (tubo boost) on the used cores higher than they could
>> with all cores busy.
>>
>> > Surprised: There are maximum 40 cores and 8 GPUs per node and thus my 4
>> > jobs should consume 8 GPUS.
>>
>> Note that even if those are 40 real cores (rather than 20 core with
>> HyperThreading), the current GROMACS release will be unlikely to run
>> efficiently with at least 6-8 cores per GPU. This will likely change
>> with the next release.
>>
>> > So I am bit surprised with the fact the same
>> > node on which my four jobs were running was already occupied with jobs
>> by
>> > some other user, which I think should not happen (may be slurm.config
>> admin
>> > issue?). Either my some jobs should have gone in queue or run on other
>> node
>> > if free.
>>
>> Sounds like a job scheduler issue (you can always check in the log the
>> detected hardware) -- and if you post an actual log I can certainly
>> give more informed comments.
>>
>> > What to do: Importantly though as an individual user I can submit
>> -multidir
>> > job but lets say, which is normally the case, there will be many other
>> > unknown users who submit one or two jobs in that case performance will
>> be
>> > an issue (which is equivalent to my case when I submit many jobs without
>> > -multi/multidir).
>>
>> Not sure I follow: if you always have a number of similar runs to do,
>> submit them together and benefit from not having to manual hardware
>> assignment. Otherwise, if your cluster relies on node sharing, you
>> will have to make sure that you specify correctly the affinity/binding
>> arguments to your job scheduler (or work around it with manual offset
>> calculation). However, note that if you are sharing a node with
>> others, if their jobs are not correctly affinitized, those processes
>> will affect the performance of your job.
>>
>> > I think still they will need -pinoffset. Could you
>> > please suggest what best can be done in such case?
>>
>> See above.
>>
>> Cheers,
>> --
>> Szilárd
>>
>> >
>> > -Jiom
>> >
>> >
>> >
>> >
>> > On Wed, Sep 13, 2017 at 9:15 PM, Szilárd Páll <pall.szilard at gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> First off, have you considered options 2) using multi-sim? That would
>> >> allow you to not have to bother manually set offsets. Can you not
>> >> submit your jobs such that you fill at least a node?
>> >>
>> >> How many threads/cores does you node have? Can you share log files?
>> >>
>> >> Cheers,
>> >> --
>> >> Szilárd
>> >>
>> >>
>> >> On Wed, Sep 13, 2017 at 9:14 PM, gromacs query <gromacsquery at gmail.com
>> >
>> >> wrote:
>> >> > Hi Szilárd,
>> >> >
>> >> > Sorry I was bit quick to say its working with pinoffset. I just
>> submitted
>> >> > four same jobs (2 gpus, 4 nprocs) on the same node with -pin on and
>> >> > different -pinoffset to 0, 5, 10, 15 (numbers should be fine as
>> there are
>> >> > 40 cores on node). Still I don't get same performance (all variably
>> less
>> >> > than 50%) as expected from a single independent job. Now am
>> wondering if
>> >> > its still related to overlap of cores as pin on should lock the
>> cores for
>> >> > the same job.
>> >> >
>> >> > -J
>> >> >
>> >> > On Wed, Sep 13, 2017 at 7:33 PM, gromacs query <
>> gromacsquery at gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi Szilárd,
>> >> >>
>> >> >> Thanks, option 3 was in my mind but I need to figure out now how :)
>> >> >> Manually fixing pinoffset as of now seems working with some quick
>> test.
>> >> >> I think option 1 would require to ask the admin but I can try
>> option 3
>> >> >> myself. As there are other users from different places who may not
>> >> bother
>> >> >> using option 3. I think I would need to ask the admin to force
>> option 1
>> >> but
>> >> >> before that I will try option 3.
>> >> >>
>> >> >> JIom
>> >> >>
>> >> >> On Wed, Sep 13, 2017 at 7:10 PM, Szilárd Páll <
>> pall.szilard at gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >>> J,
>> >> >>>
>> >> >>> You have a few options:
>> >> >>>
>> >> >>> * Use SLURM to assign not only the set of GPUs, but also the
>> correct
>> >> >>> set of CPU cores to each mdrun process. If you do so, mdrun will
>> >> >>> respect the affinity mask it will inherit and your two mdrun jobs
>> >> >>> should be running on the right set of cores. This has the drawback
>> >> >>> that (AFAIK) SLURM/aprun (or srun) will not allow you to bind each
>> >> >>> application thread to a core/hardware thread (which is what mdrun
>> >> >>> does), only a process to a group of cores/hw threads which can
>> >> >>> sometimes lead to performance loss. (You might be able to
>> compensate
>> >> >>> using some OpenMP library environment variables, though.)
>> >> >>>
>> >> >>> * Run multiple jobs with mdrun "-multi"/"-multidir"  (either two
>> per
>> >> >>> node or mulitple across nodes) and benefit from the rank/thread to
>> >> >>> core/hw thread assignment that's supported also across multiple
>> >> >>> simulations part of a multi-run; e.g.:
>> >> >>> mpirun -np 4 gmx mdrun -multi 4 -ntomp N -multidir
>> >> my_input_dir{1,2,3,4}
>> >> >>> will launch 4 ranks and start 4 simulations in each of the four
>> >> >>> directories passed.
>> >> >>>
>> >> >>> * Write a wrapper script around gmx mdrun which will be what you
>> >> >>> launch with SLURM; you can then inspect the node and decide what
>> >> >>> pinoffset value to pass to your mdrun launch command.
>> >> >>>
>> >> >>>
>> >> >>> I hope one of these will deliver the desired results :)
>> >> >>>
>> >> >>> Cheers,
>> >> >>> --
>> >> >>> Szilárd
>> >> >>>
>> >> >>>
>> >> >>> On Wed, Sep 13, 2017 at 7:47 PM, gromacs query <
>> gromacsquery at gmail.com
>> >> >
>> >> >>> wrote:
>> >> >>> > Hi Szilárd,
>> >> >>> >
>> >> >>> > Thanks for your reply. This is useful but now am thinking
>> because the
>> >> >>> slurm
>> >> >>> > launches job in an automated way it is not really in my control
>> to
>> >> >>> choose
>> >> >>> > the node. So following things can happen; say for two mdrun jobs
>> I
>> >> set
>> >> >>> > -pinoffset 0 and -pinoffset 4;
>> >> >>> >
>> >> >>> > - if they are running on the same node this is good
>> >> >>> > - if jobs run on different nodes (partially occupied or free)
>> whether
>> >> >>> these
>> >> >>> > chosen pinoffsets will make sense or not as I don't know what
>> >> pinoffset
>> >> >>> I
>> >> >>> > would need to set
>> >> >>> > - if I have to submit many jobs together and slurm chooses
>> >> >>> different/same
>> >> >>> > node itself then I think it is difficult to define pinoffset.
>> >> >>> >
>> >> >>> > -
>> >> >>> > J
>> >> >>> >
>> >> >>> > On Wed, Sep 13, 2017 at 6:14 PM, Szilárd Páll <
>> >> pall.szilard at gmail.com>
>> >> >>> > wrote:
>> >> >>> >
>> >> >>> >> My guess is that the two jobs are using the same cores --
>> either all
>> >> >>> >> cores/threads or only half of them, but the same set.
>> >> >>> >>
>> >> >>> >> You should use -pinoffset; see:
>> >> >>> >>
>> >> >>> >> - Docs and example:
>> >> >>> >> http://manual.gromacs.org/documentation/2016/user-guide/
>> >> >>> >> mdrun-performance.html
>> >> >>> >>
>> >> >>> >> - More explanation on the thread pinning behavior on the old
>> >> website:
>> >> >>> >> http://www.gromacs.org/Documentation/Acceleration_
>> >> >>> >> and_parallelization#Pinning_threads_to_physical_cores
>> >> >>> >>
>> >> >>> >> Cheers,
>> >> >>> >> --
>> >> >>> >> Szilárd
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> On Wed, Sep 13, 2017 at 6:35 PM, gromacs query <
>> >> gromacsquery at gmail.com
>> >> >>> >
>> >> >>> >> wrote:
>> >> >>> >> > Sorry forgot to add; we thought the two jobs are using same
>> GPU
>> >> ids
>> >> >>> but
>> >> >>> >> > cuda visible devices show both jobs are using different ids
>> (0,1
>> >> and
>> >> >>> 2,3)
>> >> >>> >> >
>> >> >>> >> > -
>> >> >>> >> > J
>> >> >>> >> >
>> >> >>> >> > On Wed, Sep 13, 2017 at 5:33 PM, gromacs query <
>> >> >>> gromacsquery at gmail.com>
>> >> >>> >> > wrote:
>> >> >>> >> >
>> >> >>> >> >> Hi All,
>> >> >>> >> >>
>> >> >>> >> >> I have some issues with gromacs performance. There are many
>> nodes
>> >> >>> and
>> >> >>> >> each
>> >> >>> >> >> node has number of gpus and the batch process is controlled
>> by
>> >> >>> slurm.
>> >> >>> >> >> Although I get good performance with some settings of number
>> of
>> >> >>> gpus and
>> >> >>> >> >> nprocs but when I submit same job twice on the same node
>> then the
>> >> >>> >> >> performance is reduced drastically. e.g
>> >> >>> >> >>
>> >> >>> >> >> For 2 GPUs I get 300 ns per day when there is no other job
>> >> running
>> >> >>> on
>> >> >>> >> the
>> >> >>> >> >> node. When I submit same job twice on the same node & at the
>> same
>> >> >>> time,
>> >> >>> >> I
>> >> >>> >> >> get only 17 ns/day for both the jobs. I am using this:
>> >> >>> >> >>
>> >> >>> >> >> mpirun -np 4 gmx_mpi mdrun -deffnm test -ntomp 2 -maxh 0.12
>> >> >>> >> >>
>> >> >>> >> >> Any suggestions highly appreciated.
>> >> >>> >> >>
>> >> >>> >> >> Thanks
>> >> >>> >> >>
>> >> >>> >> >> Jiom
>> >> >>> >> >>
>> >> >>> >> > --
>> >> >>> >> > Gromacs Users mailing list
>> >> >>> >> >
>> >> >>> >> > * Please search the archive at http://www.gromacs.org/
>> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >> >>> >> >
>> >> >>> >> > * Can't post? Read http://www.gromacs.org/Support
>> /Mailing_Lists
>> >> >>> >> >
>> >> >>> >> > * For (un)subscribe requests visit
>> >> >>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
>> >> gmx-users
>> >> >>> or
>> >> >>> >> send a mail to gmx-users-request at gromacs.org.
>> >> >>> >> --
>> >> >>> >> Gromacs Users mailing list
>> >> >>> >>
>> >> >>> >> * Please search the archive at http://www.gromacs.org/
>> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >> >>> >>
>> >> >>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >>> >>
>> >> >>> >> * For (un)subscribe requests visit
>> >> >>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx
>> -users
>> >> or
>> >> >>> >> send a mail to gmx-users-request at gromacs.org.
>> >> >>> > --
>> >> >>> > Gromacs Users mailing list
>> >> >>> >
>> >> >>> > * Please search the archive at http://www.gromacs.org/Support
>> >> >>> /Mailing_Lists/GMX-Users_List before posting!
>> >> >>> >
>> >> >>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >>> >
>> >> >>> > * For (un)subscribe requests visit
>> >> >>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx
>> -users
>> >> or
>> >> >>> send a mail to gmx-users-request at gromacs.org.
>> >> >>> --
>> >> >>> Gromacs Users mailing list
>> >> >>>
>> >> >>> * Please search the archive at http://www.gromacs.org/Support
>> >> >>> /Mailing_Lists/GMX-Users_List before posting!
>> >> >>>
>> >> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >>>
>> >> >>> * For (un)subscribe requests visit
>> >> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> or
>> >> >>> send a mail to gmx-users-request at gromacs.org.
>> >> >>>
>> >> >>
>> >> >>
>> >> > --
>> >> > Gromacs Users mailing list
>> >> >
>> >> > * Please search the archive at http://www.gromacs.org/
>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >> >
>> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >
>> >> > * For (un)subscribe requests visit
>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> or
>> >> send a mail to gmx-users-request at gromacs.org.
>> >> --
>> >> Gromacs Users mailing list
>> >>
>> >> * Please search the archive at http://www.gromacs.org/
>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >>
>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>
>> >> * For (un)subscribe requests visit
>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >> send a mail to gmx-users-request at gromacs.org.
>> >>
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at http://www.gromacs.org/Support
>> /Mailing_Lists/GMX-Users_List before posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support
>> /Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
>
>