[gmx-users] performance

Szilárd Páll pall.szilard at gmail.com
Mon Sep 18 16:12:02 CEST 2017


On Fri, Sep 15, 2017 at 1:06 AM, gromacs query <gromacsquery at gmail.com> wrote:
> Hi Szilárd,
>
> Sorry this discussion is going long.
> Finally I got one node empty and did some serious tests specially
> considering your first point (discrepancies in benchmarking comparing jobs
> running on empty node vs occupied node). I tested in both ways.
>
> I ran following cases (single job vs two jobs for 2GPU+4 procs and also for
> 4GPU+16 procs). Happy to send log files.

Please do share them, it's hard to assess what's going on without those.

> Pinoffset results are surprising (4th and 8th test case below) though I get
> in log file a WARNING: Requested offset too large for available cores for
> the case 8; [should not be an issue as the first job binds the cores]

That means the offsets are not set correctly.

> As suggested defining affinity should help with pinoffset set 'manually'
> (in practice with script) but these results are quite variable. Am bit lost
> now, what should be the best practice in case nodes are shared among
> different users and multidir can be tricky in such case (if other gromacs
> users are not using multidir option!).

I suggest fixing the above issue first. I don't fully understand what
the below descriptions mean, please be more specific about the details
or share logs.

>
> Sr. no. each job 2GPU; 4 procs performance (ns/day)
> 1 only one job 345
> 2 two same jobs together (without pin on) 16.1 and 15.9
> 3 two same jobs together (without pin on, with -multidir) 178 and 191
> 4 two same jobs together (pin on, pinoffset at 0 and 5) 160 and 301
> each job 4GPU; 16 procs performance (ns/day)
> 5 only one job 694
> 6 two same jobs together (without pin on) 340 and 350
> 7 two same jobs together (without pin on, with -multidir) 346 and 344
> 8 two same jobs together (pin on, pinoffset at 0 and 17) 204 and 546
>
>
> On Thu, Sep 14, 2017 at 12:02 PM, gromacs query <gromacsquery at gmail.com>
> wrote:
>
>> Hi Szilárd,
>>
>> Here are my replies:
>>
>> >> Did you run the "fast" single job on an otherwise empty node? That
>> might explain it as, when most of the CPU cores are left empty, modern CPUs
>> increase clocks (tubo boost) on the used cores higher than they could with
>> all cores busy.
>>
>> Yes the "fast" single job was on empty node. Sorry I don't get it when you
>> say 'modern CPUs increase clocks', you mean the ns/day I get is pseudo in
>> that case?
>>
>> >> and if you post an actual log I can certainly give more informed
>> comments
>>
>> Sure, if its ok can I post it off-mailing list to you?
>>
>> >> However, note that if you are sharing a node with others, if their jobs
>> are not correctly affinitized, those processes will affect the performance
>> of your job.
>>
>> Yes exactly. In this case I would need to manually set pinoffset but this
>> can be but frustrating if other Gromacs users are not binding :)
>> Would it be possible to fix this in the default algorithm, though am
>> unaware of other issues it might cause? Also mutidir is not convenient
>> sometimes when job crashes in the middle and automatic restart from cpt
>> file would be difficult.
>>
>> -J
>>
>>
>> On Thu, Sep 14, 2017 at 11:26 AM, Szilárd Páll <pall.szilard at gmail.com>
>> wrote:
>>
>>> On Wed, Sep 13, 2017 at 11:14 PM, gromacs query <gromacsquery at gmail.com>
>>> wrote:
>>> > Hi Szilárd,
>>> >
>>> > Thanks again. I tried now with -multidir like this:
>>> >
>>> > mpirun -np 16 gmx_mpi mdrun -s test -ntomp 2 -maxh 0.1 -multidir t1 t2
>>> t3 t4
>>> >
>>> > So this runs 4 jobs on same node so for each job np is = 16/4, and each
>>> job
>>> > using 2 GPU. I get now quite improved performance and equal performance
>>> for
>>> > each job (~ 220 ns) though still slightly less than single independent
>>> job
>>> > (where I get 300 ns). I can live with that but -
>>>
>>> That is not normal and it is more likely to be a benchmarking
>>> discrepancy: you are likely not comparing apples to apples. Did you
>>> run the "fast" single job on an otherwise empty node? That might
>>> explain it as, when most of the CPU cores are left empty, modern CPUs
>>> increase clocks (tubo boost) on the used cores higher than they could
>>> with all cores busy.
>>>
>>> > Surprised: There are maximum 40 cores and 8 GPUs per node and thus my 4
>>> > jobs should consume 8 GPUS.
>>>
>>> Note that even if those are 40 real cores (rather than 20 core with
>>> HyperThreading), the current GROMACS release will be unlikely to run
>>> efficiently with at least 6-8 cores per GPU. This will likely change
>>> with the next release.
>>>
>>> > So I am bit surprised with the fact the same
>>> > node on which my four jobs were running was already occupied with jobs
>>> by
>>> > some other user, which I think should not happen (may be slurm.config
>>> admin
>>> > issue?). Either my some jobs should have gone in queue or run on other
>>> node
>>> > if free.
>>>
>>> Sounds like a job scheduler issue (you can always check in the log the
>>> detected hardware) -- and if you post an actual log I can certainly
>>> give more informed comments.
>>>
>>> > What to do: Importantly though as an individual user I can submit
>>> -multidir
>>> > job but lets say, which is normally the case, there will be many other
>>> > unknown users who submit one or two jobs in that case performance will
>>> be
>>> > an issue (which is equivalent to my case when I submit many jobs without
>>> > -multi/multidir).
>>>
>>> Not sure I follow: if you always have a number of similar runs to do,
>>> submit them together and benefit from not having to manual hardware
>>> assignment. Otherwise, if your cluster relies on node sharing, you
>>> will have to make sure that you specify correctly the affinity/binding
>>> arguments to your job scheduler (or work around it with manual offset
>>> calculation). However, note that if you are sharing a node with
>>> others, if their jobs are not correctly affinitized, those processes
>>> will affect the performance of your job.
>>>
>>> > I think still they will need -pinoffset. Could you
>>> > please suggest what best can be done in such case?
>>>
>>> See above.
>>>
>>> Cheers,
>>> --
>>> Szilárd
>>>
>>> >
>>> > -Jiom
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, Sep 13, 2017 at 9:15 PM, Szilárd Páll <pall.szilard at gmail.com>
>>> > wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> First off, have you considered options 2) using multi-sim? That would
>>> >> allow you to not have to bother manually set offsets. Can you not
>>> >> submit your jobs such that you fill at least a node?
>>> >>
>>> >> How many threads/cores does you node have? Can you share log files?
>>> >>
>>> >> Cheers,
>>> >> --
>>> >> Szilárd
>>> >>
>>> >>
>>> >> On Wed, Sep 13, 2017 at 9:14 PM, gromacs query <gromacsquery at gmail.com
>>> >
>>> >> wrote:
>>> >> > Hi Szilárd,
>>> >> >
>>> >> > Sorry I was bit quick to say its working with pinoffset. I just
>>> submitted
>>> >> > four same jobs (2 gpus, 4 nprocs) on the same node with -pin on and
>>> >> > different -pinoffset to 0, 5, 10, 15 (numbers should be fine as
>>> there are
>>> >> > 40 cores on node). Still I don't get same performance (all variably
>>> less
>>> >> > than 50%) as expected from a single independent job. Now am
>>> wondering if
>>> >> > its still related to overlap of cores as pin on should lock the
>>> cores for
>>> >> > the same job.
>>> >> >
>>> >> > -J
>>> >> >
>>> >> > On Wed, Sep 13, 2017 at 7:33 PM, gromacs query <
>>> gromacsquery at gmail.com>
>>> >> > wrote:
>>> >> >
>>> >> >> Hi Szilárd,
>>> >> >>
>>> >> >> Thanks, option 3 was in my mind but I need to figure out now how :)
>>> >> >> Manually fixing pinoffset as of now seems working with some quick
>>> test.
>>> >> >> I think option 1 would require to ask the admin but I can try
>>> option 3
>>> >> >> myself. As there are other users from different places who may not
>>> >> bother
>>> >> >> using option 3. I think I would need to ask the admin to force
>>> option 1
>>> >> but
>>> >> >> before that I will try option 3.
>>> >> >>
>>> >> >> JIom
>>> >> >>
>>> >> >> On Wed, Sep 13, 2017 at 7:10 PM, Szilárd Páll <
>>> pall.szilard at gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >>> J,
>>> >> >>>
>>> >> >>> You have a few options:
>>> >> >>>
>>> >> >>> * Use SLURM to assign not only the set of GPUs, but also the
>>> correct
>>> >> >>> set of CPU cores to each mdrun process. If you do so, mdrun will
>>> >> >>> respect the affinity mask it will inherit and your two mdrun jobs
>>> >> >>> should be running on the right set of cores. This has the drawback
>>> >> >>> that (AFAIK) SLURM/aprun (or srun) will not allow you to bind each
>>> >> >>> application thread to a core/hardware thread (which is what mdrun
>>> >> >>> does), only a process to a group of cores/hw threads which can
>>> >> >>> sometimes lead to performance loss. (You might be able to
>>> compensate
>>> >> >>> using some OpenMP library environment variables, though.)
>>> >> >>>
>>> >> >>> * Run multiple jobs with mdrun "-multi"/"-multidir"  (either two
>>> per
>>> >> >>> node or mulitple across nodes) and benefit from the rank/thread to
>>> >> >>> core/hw thread assignment that's supported also across multiple
>>> >> >>> simulations part of a multi-run; e.g.:
>>> >> >>> mpirun -np 4 gmx mdrun -multi 4 -ntomp N -multidir
>>> >> my_input_dir{1,2,3,4}
>>> >> >>> will launch 4 ranks and start 4 simulations in each of the four
>>> >> >>> directories passed.
>>> >> >>>
>>> >> >>> * Write a wrapper script around gmx mdrun which will be what you
>>> >> >>> launch with SLURM; you can then inspect the node and decide what
>>> >> >>> pinoffset value to pass to your mdrun launch command.
>>> >> >>>
>>> >> >>>
>>> >> >>> I hope one of these will deliver the desired results :)
>>> >> >>>
>>> >> >>> Cheers,
>>> >> >>> --
>>> >> >>> Szilárd
>>> >> >>>
>>> >> >>>
>>> >> >>> On Wed, Sep 13, 2017 at 7:47 PM, gromacs query <
>>> gromacsquery at gmail.com
>>> >> >
>>> >> >>> wrote:
>>> >> >>> > Hi Szilárd,
>>> >> >>> >
>>> >> >>> > Thanks for your reply. This is useful but now am thinking
>>> because the
>>> >> >>> slurm
>>> >> >>> > launches job in an automated way it is not really in my control
>>> to
>>> >> >>> choose
>>> >> >>> > the node. So following things can happen; say for two mdrun jobs
>>> I
>>> >> set
>>> >> >>> > -pinoffset 0 and -pinoffset 4;
>>> >> >>> >
>>> >> >>> > - if they are running on the same node this is good
>>> >> >>> > - if jobs run on different nodes (partially occupied or free)
>>> whether
>>> >> >>> these
>>> >> >>> > chosen pinoffsets will make sense or not as I don't know what
>>> >> pinoffset
>>> >> >>> I
>>> >> >>> > would need to set
>>> >> >>> > - if I have to submit many jobs together and slurm chooses
>>> >> >>> different/same
>>> >> >>> > node itself then I think it is difficult to define pinoffset.
>>> >> >>> >
>>> >> >>> > -
>>> >> >>> > J
>>> >> >>> >
>>> >> >>> > On Wed, Sep 13, 2017 at 6:14 PM, Szilárd Páll <
>>> >> pall.szilard at gmail.com>
>>> >> >>> > wrote:
>>> >> >>> >
>>> >> >>> >> My guess is that the two jobs are using the same cores --
>>> either all
>>> >> >>> >> cores/threads or only half of them, but the same set.
>>> >> >>> >>
>>> >> >>> >> You should use -pinoffset; see:
>>> >> >>> >>
>>> >> >>> >> - Docs and example:
>>> >> >>> >> http://manual.gromacs.org/documentation/2016/user-guide/
>>> >> >>> >> mdrun-performance.html
>>> >> >>> >>
>>> >> >>> >> - More explanation on the thread pinning behavior on the old
>>> >> website:
>>> >> >>> >> http://www.gromacs.org/Documentation/Acceleration_
>>> >> >>> >> and_parallelization#Pinning_threads_to_physical_cores
>>> >> >>> >>
>>> >> >>> >> Cheers,
>>> >> >>> >> --
>>> >> >>> >> Szilárd
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> On Wed, Sep 13, 2017 at 6:35 PM, gromacs query <
>>> >> gromacsquery at gmail.com
>>> >> >>> >
>>> >> >>> >> wrote:
>>> >> >>> >> > Sorry forgot to add; we thought the two jobs are using same
>>> GPU
>>> >> ids
>>> >> >>> but
>>> >> >>> >> > cuda visible devices show both jobs are using different ids
>>> (0,1
>>> >> and
>>> >> >>> 2,3)
>>> >> >>> >> >
>>> >> >>> >> > -
>>> >> >>> >> > J
>>> >> >>> >> >
>>> >> >>> >> > On Wed, Sep 13, 2017 at 5:33 PM, gromacs query <
>>> >> >>> gromacsquery at gmail.com>
>>> >> >>> >> > wrote:
>>> >> >>> >> >
>>> >> >>> >> >> Hi All,
>>> >> >>> >> >>
>>> >> >>> >> >> I have some issues with gromacs performance. There are many
>>> nodes
>>> >> >>> and
>>> >> >>> >> each
>>> >> >>> >> >> node has number of gpus and the batch process is controlled
>>> by
>>> >> >>> slurm.
>>> >> >>> >> >> Although I get good performance with some settings of number
>>> of
>>> >> >>> gpus and
>>> >> >>> >> >> nprocs but when I submit same job twice on the same node
>>> then the
>>> >> >>> >> >> performance is reduced drastically. e.g
>>> >> >>> >> >>
>>> >> >>> >> >> For 2 GPUs I get 300 ns per day when there is no other job
>>> >> running
>>> >> >>> on
>>> >> >>> >> the
>>> >> >>> >> >> node. When I submit same job twice on the same node & at the
>>> same
>>> >> >>> time,
>>> >> >>> >> I
>>> >> >>> >> >> get only 17 ns/day for both the jobs. I am using this:
>>> >> >>> >> >>
>>> >> >>> >> >> mpirun -np 4 gmx_mpi mdrun -deffnm test -ntomp 2 -maxh 0.12
>>> >> >>> >> >>
>>> >> >>> >> >> Any suggestions highly appreciated.
>>> >> >>> >> >>
>>> >> >>> >> >> Thanks
>>> >> >>> >> >>
>>> >> >>> >> >> Jiom
>>> >> >>> >> >>
>>> >> >>> >> > --
>>> >> >>> >> > Gromacs Users mailing list
>>> >> >>> >> >
>>> >> >>> >> > * Please search the archive at http://www.gromacs.org/
>>> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>>> >> >>> >> >
>>> >> >>> >> > * Can't post? Read http://www.gromacs.org/Support
>>> /Mailing_Lists
>>> >> >>> >> >
>>> >> >>> >> > * For (un)subscribe requests visit
>>> >> >>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
>>> >> gmx-users
>>> >> >>> or
>>> >> >>> >> send a mail to gmx-users-request at gromacs.org.
>>> >> >>> >> --
>>> >> >>> >> Gromacs Users mailing list
>>> >> >>> >>
>>> >> >>> >> * Please search the archive at http://www.gromacs.org/
>>> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>>> >> >>> >>
>>> >> >>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >> >>> >>
>>> >> >>> >> * For (un)subscribe requests visit
>>> >> >>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx
>>> -users
>>> >> or
>>> >> >>> >> send a mail to gmx-users-request at gromacs.org.
>>> >> >>> > --
>>> >> >>> > Gromacs Users mailing list
>>> >> >>> >
>>> >> >>> > * Please search the archive at http://www.gromacs.org/Support
>>> >> >>> /Mailing_Lists/GMX-Users_List before posting!
>>> >> >>> >
>>> >> >>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >> >>> >
>>> >> >>> > * For (un)subscribe requests visit
>>> >> >>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx
>>> -users
>>> >> or
>>> >> >>> send a mail to gmx-users-request at gromacs.org.
>>> >> >>> --
>>> >> >>> Gromacs Users mailing list
>>> >> >>>
>>> >> >>> * Please search the archive at http://www.gromacs.org/Support
>>> >> >>> /Mailing_Lists/GMX-Users_List before posting!
>>> >> >>>
>>> >> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >> >>>
>>> >> >>> * For (un)subscribe requests visit
>>> >> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>> or
>>> >> >>> send a mail to gmx-users-request at gromacs.org.
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >> > --
>>> >> > Gromacs Users mailing list
>>> >> >
>>> >> > * Please search the archive at http://www.gromacs.org/
>>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>>> >> >
>>> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >> >
>>> >> > * For (un)subscribe requests visit
>>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>> or
>>> >> send a mail to gmx-users-request at gromacs.org.
>>> >> --
>>> >> Gromacs Users mailing list
>>> >>
>>> >> * Please search the archive at http://www.gromacs.org/
>>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>>> >>
>>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >>
>>> >> * For (un)subscribe requests visit
>>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> >> send a mail to gmx-users-request at gromacs.org.
>>> >>
>>> > --
>>> > Gromacs Users mailing list
>>> >
>>> > * Please search the archive at http://www.gromacs.org/Support
>>> /Mailing_Lists/GMX-Users_List before posting!
>>> >
>>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >
>>> > * For (un)subscribe requests visit
>>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at http://www.gromacs.org/Support
>>> /Mailing_Lists/GMX-Users_List before posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>>
>>
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list