[gmx-users] nvidia tesla p100

Irem Altan irem.altan at duke.edu
Mon Oct 31 23:48:00 CET 2016


+ echo 2= 2
2= 2

I mean, it does have two CPUs, with 16 cores each, so maybe that’s the
problem?


There's no detection in any of this. You chose a single node and two tasks
per node, so you're getting what you asked for. That's just not a probably
not good thing to ask for.

I’m assigning two tasks because I was following an example submission script they had. I assumed this was just because the node has 2 GPUs. Should this number be increased/decreased? If so, the highest it will allow me is 16, not 32. Which leads me to think that they actually have 16 cores with hyperthreading. On their website, they report that the nodes have


  *   2 Intel Xeon v4 CPUs (16 cores, 2.1 GHz base frequency)

which I assumed meant 2x16, but maybe not.

So that's probably related to the thing that that error message is actually
reporting, which is the range of hardware cores on which each thread might
run. See background at
https://urldefense.proofpoint.com/v2/url?u=http-3A__manual.gromacs.org_documentation_2016.1_user-2Dguide_mdrun-2Dperformance.html&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=r1Wl_e-3DAvYeqhtCRi2Mbok8HBpo_RH4ll0E7Hffr4&m=EYuVMjgf1K6BRzgOcLZqJMzvwTCpzoYs5utDSI2FB3o&s=EpGR3DExshuuzLLbUc2lkDN_UAuQtXEggHnq4s_I-48&e= .
If they're allowed to move all over the place, then the memory cache is
trashed. Since MPI libraries tend to set these in response to job
schedulers and users, by default mdrun respects affinity masks if set. A
quick test is

mpirun -np 32 gmx_mpi mdrun -ntomp 1 -v -deffnm npt -pin on

which directs mdrun to do something we think is good, rather than you
working out how to do things with SLURM+MPI. Should be a dramatic
improvement, but the hint about using fewer ranks to get more threads per
rank is probably better still.


Well, that did not result in any change in the speed.

The best result I was able to get was with these settings:

!/bin/bash
#SBATCH -N 1 --tasks-per-node=16
#SBATCH -t 00:30:00
#SBATCH -p GPU_100-debug --gres=gpu:2

# Setup the module command
set echo
set -x

module load gromacs/5.1.2

cd $SLURM_SUBMIT_DIR
echo "$SLURM_NPROCS=" $SLURM_NPROCS
mpirun -np $SLURM_NPROCS gmx_mpi mdrun -ntomp 2 -v -deffnm npt

Even then, it’s slower than our local machine that has 24 cores and a single K20c. Is this normal?

(bridges)
               Core t (s)   Wall t (s)        (%)
       Time:    47432.357      741.131     6400.0
                 (ns/day)    (hour/ns)
Performance:      116.579        0.206



(local machine)
               Core t (s)   Wall t (s)        (%)
       Time:    12450.387      519.447     2396.9
                 (ns/day)    (hour/ns)
Performance:      166.331        0.144


Best,
Irem


More information about the gromacs.org_gmx-users mailing list