[gmx-users] Re: Gromacs-4.6 on two Titans GPUs

Thu Nov 7 09:53:56 CET 2013

First, there is no value in ascribing problems to the hardware if the
simulation setup is not yet balanced, or not large enough to provide enough
atoms and long enough rlist to saturate the GPUs, etc. Look at the log
files and see what complaints mdrun makes about things like PME load
balance, and the times reported for different components of the simulation,
because these must differ between the two runs you report. diff -y -W 160
*log |less is your friend. Some (non-GPU-specific) background information
in part 5 here
http://www.gromacs.org/Documentation/Tutorials/GROMACS_USA_Workshop_and_Conference_2013/Topology_preparation%2c_%22What's_in_a_log_file%22%2c_basic_performance_improvements%3a_Mark_Abraham%2c_Session_1A
(though
I recommend the PDF version)

Mark

On Thu, Nov 7, 2013 at 6:34 AM, James Starlight <jmsstarlight at gmail.com>wrote:

> I've gone to conclusion that simulation with 1 or 2 GPU simultaneously gave
> me the same performance
> mdrun -ntmpi 2 -ntomp 6 -gpu_id 01 -v  -deffnm md_CaM_test,
>
> mdrun -ntmpi 2 -ntomp 6 -gpu_id 0 -v  -deffnm md_CaM_test,
>
> Doest it be due to the small CPU cores or addition RAM ( this system has 32
> gb) is needed ? OR may be some extra options are needed in the config?
>
> James
>
>
>
>
> 2013/11/6 Richard Broadbent <richard.broadbent09 at imperial.ac.uk>
>
> > Hi Dwey,
> >
> >
> > On 05/11/13 22:00, Dwey Kauffman wrote:
> >
> >> Hi Szilard,
> >>
> >>     Thanks for your suggestions. I am  indeed aware of this page. In a
> >> 8-core
> >> AMD with 1GPU, I am very happy about its performance. See below. My
> >> intention is to obtain a even better one because we have multiple nodes.
> >>
> >> ### 8 core AMD with  1 GPU,
> >> Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554
> >> For optimal performance this ratio should be close to 1!
> >>
> >>
> >> NOTE: The GPU has >20% more load than the CPU. This imbalance causes
> >>        performance loss, consider using a shorter cut-off and a finer
> PME
> >> grid.
> >>
> >>                 Core t (s)   Wall t (s)        (%)
> >>         Time:   216205.510    27036.812      799.7
> >>                           7h30:36
> >>                   (ns/day)    (hour/ns)
> >> Performance:       31.956        0.751
> >>
> >> ### 8 core AMD with 2 GPUs
> >>
> >>                 Core t (s)   Wall t (s)        (%)
> >>         Time:   178961.450    22398.880      799.0
> >>                           6h13:18
> >>                   (ns/day)    (hour/ns)
> >> Performance:       38.573        0.622
> >> Finished mdrun on node 0 Sat Jul 13 09:24:39 2013
> >>
> >>
> > I'm almost certain that Szilard meant the lines above this that give the
> > breakdown of where the time is spent in the simulation.
> >
> > Richard
> >
> >
> >>  However, in your case I suspect that the
> >>> bottleneck is multi-threaded scaling on the AMD CPUs and you should
> >>> probably decrease the number of threads per MPI rank and share GPUs
> >>> between 2-4 ranks.
> >>>
> >>
> >>
> >> OK but can you give a example of mdrun command ? given a 8 core AMD
> with 2
> >> GPUs.
> >> I will try to run it again.
> >>
> >>
> >>  Regarding scaling across nodes, you can't expect much from gigabit
> >>> ethernet - especially not from the cheaper cards/switches, in my
> >>> experience even reaction field runs don't scale across nodes with 10G
> >>> ethernet if you have more than 4-6 ranks per node trying to
> >>> communicate (let alone with PME). However, on infiniband clusters we
> >>> have seen scaling to 100 atoms/core (at peak).
> >>>
> >>
> >>  From your comments, it sounds like a cluster of AMD cpus is difficult
> to
> >>>
> >> scale across nodes in our current setup.
> >>
> >> Let's assume we install Infiniband (20 or 40GB/s) in the same system of
> 16
> >> nodes of 8 core AMD with 1 GPU only. Considering the same AMD system,
> what
> >> is a good way to obtain better performance  when we run a task across
> >> nodes
> >> ? in other words, what dose mudrun_mpi look like ?
> >>
> >> Thanks,
> >> Dwey
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://gromacs.5086.x6.nabble.
> >> com/Gromacs-4-6-on-two-Titans-GPUs-tp5012186p5012279.html
> >> Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
> >>
> >>  --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at http://www.gromacs.org/
> > Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the www
> > interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>