[gmx-users] GPU performance

Szilárd Páll szilard.pall at cbr.su.se
Wed Apr 10 11:58:09 CEST 2013


On Wed, Apr 10, 2013 at 3:34 AM, Benjamin Bobay <bgbobay at ncsu.edu> wrote:

> Szilárd -
>
> First, many thanks for the reply.
>
> Second, I am glad that I am not crazy.
>
> Ok so based on your suggestions, I think I know what the problem is/was.
> There was a sander process running on 1 of the CPUs.  Clearly GROMACS was
> trying to use 4 with "Using 4 OpenMP thread". I just did not catch that.
> Sorry! Rookie mistake.
>
> Which I guess leads me to my next question (sorry if its too naive):
>
> (1) When running GROMACS (or a I guess any other CUDA based programs), its
> best to have all the CPUs free, right? I guess based on my results I have
> pretty much answered that question.  Although I thought that as long as I
> have one CPU available to run the GPU it would be good: would setting
> "-ntmpi 1 -ntomp 1" help or would I take a major hit in ns/day as well?
>

Such a behavior is not specific to GROMACS or CUDA-accelerated codes, but
all compute-intensive codes that expect to be running "alone" on the set of
CPU cores they are started on. As you could see on the output, mdrun
automatically detected that you have 4 CPU cores and as Mark saied, it
tries to use all of them along the GPU. As one of the cores was busy, you
ended up in a situation in which four threads of mdrun plus the
(presumably) one thread of sander are competing for four cores. This is
made even worse by the fact that when using a full machine, mdrun locks its
threads to physical cores to prevent the OS from moving them around (which
can cause performance loss).

Secondly, using a single core with a GPU will not result in a very good
performance in GROMACS. The current GROMACS acceleration expects to run on
a couple of CPU cores together with a GPU - which is the typical balance of
CPU-GPU hardware most clusters (1 GPU/socket) as well as many home users
would have (1-2 GPUs for 4-8 CPU cores).


>
> If I try the benchmarks again just to see (for fun) with "Using 4 OpenMP
> thread", under top I have - so I think the CPU is fine :
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 24791 bobayb    20   0 48.3g  51m 7576 R 299.1  0.2  11:32.90
> mdrun
>

Nope, that just means, roughly speaking, that sander is probably fully
using one core and the four thread of mdrun are "crammed" on the remaining
three cores - which is bad.

However, you can simply run mdrun using three threads which will run fine
along sander. Whether this will be efficient or not, you'll have to see.
Note that if some other program is using the GPU as well, don't expect full
performance - but the difference will be much less than in the case
of oversubscribed CPU cores.

Cheers,
--
Szilárd


>
> When I have a chance (after this sander run is done - hopefully soon) I can
> try the benchmarks again.
>
> Thanks again for the help!
>
> Ben
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list