[gmx-users] GPU and MPI

Fri Aug 29 17:16:00 CEST 2014

Hi Dawei,

On 29 Aug 2014, at 16:52, Da-Wei Li <lidawei at gmail.com> wrote:

> Dear Carsten
> 
> Thanks for the clarification. Here it is my benchmark for a small protein
> system (18k atoms).
> 
> (1) 1 node (12 cores/node, no GPU):   50 ns/day
> (2) 2 nodes (12 cores/node, no GPU): 80 ns/day
> (3) 1 node (12 cores/node, 2 K40 GPUs/node): 100 ns/day
> (4) 2 nodes (12 cores/node, 2 K40 GPUs/node): 40 ns/day
> 
> 
> I send out this question because the benchmark 4 above is very suspicious.
Indeed, if you get 80 ns/day without GPUs, then it should not be less
with GPUs. For how many time steps do you run each of the
benchmarks? Do you use the -resethway command line switch to mdrun
to disregard the first half of the run (where initialization and
balancing is done, you don’t want to count that in a benchmark)?

Carsten

> But I agree size of my system may play a role.
> 
> best,
> 
> dawei
> 
> 
> On Fri, Aug 29, 2014 at 10:36 AM, Carsten Kutzner <ckutzne at gwdg.de> wrote:
> 
>> Hi Dawei,
>> 
>> the mapping of GPUs to PP ranks is printed for the Master node only,
>> but if this node reports two GPUs, then all other PP ranks will also
>> use two GPUs (or an error is reported).
>> 
>> The scaling will depend also on your system size, if this is too small,
>> then you might be better off by using a single node.
>> 
>> Carsten
>> 
>> 
>> On 29 Aug 2014, at 16:24, Da-Wei Li <lidawei at gmail.com> wrote:
>> 
>>> Dear users,
>>> 
>>> I recently try to run Gromacs on two nodes, each of them has 12 cores
>> and 2
>>> GPUs. The nodes are connected with infiniband and scaling is pretty good
>>> when no GPU is evolved.
>>> 
>>> My command is like this:
>>> 
>>> mpiexec  -npernode 2 -np 4 mdrun_mpi -ntomp 6
>>> 
>>> 
>>> However, it looks like Gromacs only detected 2 GPUs on node 0, then skip
>>> node 1. Part of the output looks like:
>>> 
>>> 
>>> ************************
>>> 
>>> Using 4 MPI processes
>>> 
>>> Using 6 OpenMP threads per MPI process
>>> 
>>> 2 GPUs detected on host n0316.ten:
>>> 
>>> #0: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
>>> 
>>> #1: NVIDIA Tesla M2070, compute cap.: 2.0, ECC: yes, stat: compatible
>>> 
>>> 2 GPUs user-selected for this run.
>>> 
>>> Mapping of GPUs to the 2 PP ranks in this node: #0, #1
>>> 
>>> ****************************
>>> 
>>> 
>>> The performance is about only 40% of the run, where I use only 1 node (12
>>> cores+2GPUs).
>>> 
>>> 
>>> Does I miss something?
>>> 
>>> 
>>> thanks.
>>> 
>>> 
>>> dawei
>>> --
>>> Gromacs Users mailing list
>>> 
>>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>> 
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> 
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>> 
>> --
>> Gromacs Users mailing list
>> 
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>> 
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> 
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.

--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa