[gmx-users] GROMACS performance issues on POWER9/V100 node

Mon Apr 27 16:26:25 CEST 2020

Hi Szilárd,

Our OS is RHEL 7.6.

Thank you for your test results. It's nice to see consistent results on a POWER9 system.

Your suggestion of allocating the whole node was a good one. I did this in two ways. The first was to bypass the Slurm scheduler by ssh-ing to an empty node and running the benchmark. The second way was through Slurm using the --exclusive directive (which allocates the entire node indepedent of job size). In both cases, which used 32 hardware threads and one V100 GPU for ADH (PME, cubic, 40k steps), the performance was about 132 ns/day which is significantly better than the 90 ns/day from before (without --exclusive). Links to the md.log files are below. Here is the Slurm script with --exclusive:

--------------------------------------------------------------------------------------------------
#!/bin/bash
#SBATCH --job-name=gmx           # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=32       # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=8G                 # memory per node (4G per cpu-core is default)
#SBATCH --time=00:10:00          # total run time limit (HH:MM:SS)
#SBATCH --gres=gpu:1             # number of gpus per node
#SBATCH --exclusive  # TASK AFFINITIES SET CORRECTLY BUT ENTIRE NODE ALLOCATED TO JOB

module purge
module load cudatoolkit/10.2

BCH=../adh_cubic
gmx grompp -f $BCH/pme_verlet.mdp -c $BCH/conf.gro -p $BCH/topol.top -o bench.tpr
srun gmx mdrun -nsteps 40000 -pin on -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -s bench.tpr
--------------------------------------------------------------------------------------------------

Here are the log files:

md.log with --exclusive:
https://github.com/jdh4/running_gromacs/blob/master/03_benchmarks/md.log.with-exclusive

md.log without --exclusive:
https://github.com/jdh4/running_gromacs/blob/master/03_benchmarks/md.log.without-exclusive

Szilárd, what is your reading of these two files?

This is a shared cluster so I can't use --exclusive for all jobs. Our nodes have four GPUs and 128 hardware threads (SMT4 so 32 cores over 2 sockets). Any thoughts on how to make a job behave like it is being run with --exclusive? The task affinities are apparently not being set properly in that case.

To solve this I tried experimenting with the --cpu-bind settings. When --exclusive is not used, I find a slight performance gain by using --cpu-bind=cores:
srun --cpu-bind=cores gmx mdrun -nsteps 40000 -pin on -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -s bench.tpr

In this case it still gives "NOTE: Thread affinity was not set" and performance is still poor.

The --exclusive result suggests that the failed hardware unit test can be ignored, I believe.

Here's a bit about our Slurm configuration:
$ grep -i affinity /etc/slurm/slurm.conf
TaskPlugin=affinity,cgroup

ldd shows that gmx is linked against libhwloc.so.5.

I have not heard from my contact at ORNL. All I can find online is that they offer GROMACS 5.1 (https://www.olcf.ornl.gov/software_package/gromacs/) and apparently nothing special is done about thread affinities.

Jon

________________________________
From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Szilárd Páll <pall.szilard at gmail.com>
Sent: Friday, April 24, 2020 6:06 PM
To: Discussion list for GROMACS users <gmx-users at gromacs.org>
Cc: gromacs.org_gmx-users at maillist.sys.kth.se <gromacs.org_gmx-users at maillist.sys.kth.se>
Subject: Re: [gmx-users] GROMACS performance issues on POWER9/V100 node

Hi,

Affinity settings on the Talos II with Ubuntu 18.04 kernel 5.0 works fine.
I get threads pinned where they should be (hwloc confirmed) and consistent
results. I also get reasonable thread placement even without pinning (i.e.
the kernel scatters first until #threads <= #hwthreads). I see only a minor
penalty to not pinning -- not too surprising given that I have a single
NUMA node and the kernel is doing its job.

Here are my quick the test results run on an 8-core Talos II Power9 + a
GPU, using the adh_cubic input:

$ grep Perf *.log
test_1x1_rep1.log:Performance:       16.617
test_1x1_rep2.log:Performance:       16.479
test_1x1_rep3.log:Performance:       16.520
test_1x2_rep1.log:Performance:       32.034
test_1x2_rep2.log:Performance:       32.389
test_1x2_rep3.log:Performance:       32.340
test_1x4_rep1.log:Performance:       62.341
test_1x4_rep2.log:Performance:       62.569
test_1x4_rep3.log:Performance:       62.476
test_1x8_rep1.log:Performance:       97.049
test_1x8_rep2.log:Performance:       96.653
test_1x8_rep3.log:Performance:       96.889

This seems to point towards some issue with the OS or setup on the IBM
machines you have -- and the unit test error may be one of the symptoms of
it (as it suggests something is off with the hardware topology and a NUMA
node is missing from it). I'd still suggest checking if a full not
allocation with all threads, memory, etc passed to the job results in
successful affinity settings i) in mdrun ii) in some other tool.

Please update this thread if you have further findings.

Cheers,
--
Szilárd

On Fri, Apr 24, 2020 at 10:52 PM Szilárd Páll <pall.szilard at gmail.com>
wrote:

>
> The following lines are found in md.log for the POWER9/V100 run:
>>
>> Overriding thread affinity set outside gmx mdrun
>> Pinning threads with an auto-selected logical core stride of 128
>> NOTE: Thread affinity was not set.
>>
>> The full md.log is available here:
>> https://github.com/jdh4/running_gromacs/blob/master/03_benchmarks/md.log
>
>
> I glanced over that at first, will see if I can reproduce it, though I
> only have access to a Raptor Talos, not an IBM machine with Ubuntu.
>
> What OS are you using?
>
>
> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
>
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.