mdrun on 8-core AMD + GTX TITAN (was: Re: [gmx-users] Re: Gromacs-4.6 on two Titans GPUs)
Szilárd Páll
pall.szilard at gmail.com
Thu Nov 7 22:12:25 CET 2013
Let's not hijack James' thread as your hardware is different from his.
On Tue, Nov 5, 2013 at 11:00 PM, Dwey Kauffman <mpi566 at gmail.com> wrote:
> Hi Szilard,
>
> Thanks for your suggestions. I am indeed aware of this page. In a 8-core
> AMD with 1GPU, I am very happy about its performance. See below. My
Actually, I was jumping to conclusions too early, as you mentioned AMD
"cluster", I assumed you must have 12-16-core Opteron CPUs. If you
have an 8-core (desktop?) AMD CPU, than you may not need to run more
than one rank per GPU.
> intention is to obtain a even better one because we have multiple nodes.
Btw, I'm not sure it's an economically viable solution to install
Infiniband network - especially if you have desktop-class machines.
Such a network will end up costing >$500 per machine just for a single
network card, let alone cabling and switches.
>
> ### 8 core AMD with 1 GPU,
> Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554
> For optimal performance this ratio should be close to 1!
>
>
> NOTE: The GPU has >20% more load than the CPU. This imbalance causes
> performance loss, consider using a shorter cut-off and a finer PME
> grid.
>
> Core t (s) Wall t (s) (%)
> Time: 216205.510 27036.812 799.7
> 7h30:36
> (ns/day) (hour/ns)
> Performance: 31.956 0.751
>
> ### 8 core AMD with 2 GPUs
>
> Core t (s) Wall t (s) (%)
> Time: 178961.450 22398.880 799.0
> 6h13:18
> (ns/day) (hour/ns)
> Performance: 38.573 0.622
> Finished mdrun on node 0 Sat Jul 13 09:24:39 2013
>
Indeed, as Richard pointed out, I was asking for *full* logs, these
summaries can't tell much, the table above the summary entitled "R E A
L C Y C L E A N D T I M E A C C O U N T I N G" as well as
other reported information across the log file is what I need to make
an assessment of your simulations' performance.
>>However, in your case I suspect that the
>>bottleneck is multi-threaded scaling on the AMD CPUs and you should
>>probably decrease the number of threads per MPI rank and share GPUs
>>between 2-4 ranks.
>
>
> OK but can you give a example of mdrun command ? given a 8 core AMD with 2
> GPUs.
> I will try to run it again.
You could try running
mpirun -np 4 mdrun -ntomp 2 -gpu_id 0011
but I suspect this won't help because your scaling issue
>
>
>>Regarding scaling across nodes, you can't expect much from gigabit
>>ethernet - especially not from the cheaper cards/switches, in my
>>experience even reaction field runs don't scale across nodes with 10G
>>ethernet if you have more than 4-6 ranks per node trying to
>>communicate (let alone with PME). However, on infiniband clusters we
>>have seen scaling to 100 atoms/core (at peak).
>
> >From your comments, it sounds like a cluster of AMD cpus is difficult to
> scale across nodes in our current setup.
>
> Let's assume we install Infiniband (20 or 40GB/s) in the same system of 16
> nodes of 8 core AMD with 1 GPU only. Considering the same AMD system, what
> is a good way to obtain better performance when we run a task across nodes
> ? in other words, what dose mudrun_mpi look like ?
>
> Thanks,
> Dwey
>
>
>
>
> --
> View this message in context: http://gromacs.5086.x6.nabble.com/Gromacs-4-6-on-two-Titans-GPUs-tp5012186p5012279.html
> Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
More information about the gromacs.org_gmx-users
mailing list