[gmx-users] GTX 960 vs Tesla K40
Alex
nedomacho at gmail.com
Sun Jun 24 01:23:56 CEST 2018
Hi Szilárd,
Thanks for the suggestion on removing that separate pme rank: 113 ns/day
instead of 90 ns/day. ;) This is running on pretty much a piece of
garbage and this performance is vs 320 ns/day on a much more powerful
box with four GPUs.
I am fine with the general concept of ranks being units of execution,
what I am not comfortable with is how one selects e.g. number of threads
per rank, depending on the system size, or the use (or non-use) of a
separate pme rank, e.g. let's say I make a system that's 3-4 times
larger in XY. Do I keep all the mdrun scripts as is, do I go through
tuning for every new system?
Some sort of a guideline with examples would be nice, or some automation
on the mdrun side, or maybe a webform that asks questions about system
size, number of GPUs/CPU cores and spits out a starting point for the
"optimal" set of mdrun keys. I am mostly learning this by varying things
(e.g. offloading or not offloading pmefft, or trying your suggestions,
for instance). A deep understanding, however, is lacking.
Alex
On Thu, Jun 21, 2018 at 10:02 AM, Szilárd Páll<pall.szilard at gmail.com
<mailto:pall.szilard at gmail.com>>wrote:
On Mon, Jun 18, 2018 at 11:35 PM Alex <nedomacho at gmail.com
<mailto:nedomacho at gmail.com>> wrote:
> Persistence is enabled so I don't have to overclock again.
Sure, makes sense. Note that strictly speaking this is not an
"overclock",
but a manual "boost clock" (to use terminology CPU vendors use).
Consumer
GPUs automatically scale their clock speeds above their nominal/base
clock
(just as CPUs do), but Tesla GPUs don't do that but rather give the
option
on the user (or put the burden if we want to look at it differently).
> To be honest, I
> am still not entirely comfortable with the notion of ranks, after
reading
> the acceleration document a bunch of times.
Feel free to ask if you need clarification.
Briefly: ranks are the execution units, typically MPI processes,
that tasks
get assigned to when decomposing work across multiple compute units
(nodes,
processors). In general, data or tasks can be decomposed (also called
data-/task-parallelization), and GROMACS does employ both, the
former for
the spatial domain decomposition, the latter for offloading PME work
to a
subset of the ranks.
> Parts of log file below and I
> will obviously appreciate suggestions/clarifications:
>
In the future, please share the full log by uploading it somewhere.
On 6/21/2018 10:02 AM, Szilárd Páll wrote:
> On Mon, Jun 18, 2018 at 11:35 PM Alex <nedomacho at gmail.com> wrote:
>
>> Persistence is enabled so I don't have to overclock again.
>
> Sure, makes sense. Note that strictly speaking this is not an "overclock",
> but a manual "boost clock" (to use terminology CPU vendors use). Consumer
> GPUs automatically scale their clock speeds above their nominal/base clock
> (just as CPUs do), but Tesla GPUs don't do that but rather give the option
> on the user (or put the burden if we want to look at it differently).
>
>
>> To be honest, I
>> am still not entirely comfortable with the notion of ranks, after reading
>> the acceleration document a bunch of times.
>
> Feel free to ask if you need clarification.
> Briefly: ranks are the execution units, typically MPI processes, that tasks
> get assigned to when decomposing work across multiple compute units (nodes,
> processors). In general, data or tasks can be decomposed (also called
> data-/task-parallelization), and GROMACS does employ both, the former for
> the spatial domain decomposition, the latter for offloading PME work to a
> subset of the ranks.
>
>
>> Parts of log file below and I
>> will obviously appreciate suggestions/clarifications:
>>
> In the future, please share the full log by uploading it somewhere.
>
>
>> Command line:
>> gmx mdrun -nt 4 -ntmpi 2 -npme 1 -pme gpu -nb gpu -s run_unstretch.tpr -o
>> traj_unstretch.trr -g md.log -c unstretched.gro
>>
> As noted before, I doubt that you benefit from using a separate PME rank
> with a single GPU.
>
> I suggest that instead you simply run:
> gmx mdrun -ntmpi 1 -pme gpu -nb gpu
> optionally, you can pass -ntomp 4, but that's the default so it's not
> needed.
>
>
More information about the gromacs.org_gmx-users
mailing list