[gmx-users] nvidia tesla p100

Mon Oct 31 23:51:39 CET 2016

Hi,

OK, well we probably can't help further unless you post a series of
suitably named log files to a file-sharing service and share the links
(list can't take attachments).

Mark

On Mon, Oct 31, 2016 at 11:48 PM Irem Altan <irem.altan at duke.edu> wrote:

> + echo 2= 2
> 2= 2
>
> I mean, it does have two CPUs, with 16 cores each, so maybe that’s the
> problem?
>
>
> There's no detection in any of this. You chose a single node and two tasks
> per node, so you're getting what you asked for. That's just not a probably
> not good thing to ask for.
>
> I’m assigning two tasks because I was following an example submission
> script they had. I assumed this was just because the node has 2 GPUs.
> Should this number be increased/decreased? If so, the highest it will allow
> me is 16, not 32. Which leads me to think that they actually have 16 cores
> with hyperthreading. On their website, they report that the nodes have
>
>
>   *   2 Intel Xeon v4 CPUs (16 cores, 2.1 GHz base frequency)
>
> which I assumed meant 2x16, but maybe not.
>
> So that's probably related to the thing that that error message is actually
> reporting, which is the range of hardware cores on which each thread might
> run. See background at
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__manual.gromacs.org_documentation_2016.1_user-2Dguide_mdrun-2Dperformance.html&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=r1Wl_e-3DAvYeqhtCRi2Mbok8HBpo_RH4ll0E7Hffr4&m=EYuVMjgf1K6BRzgOcLZqJMzvwTCpzoYs5utDSI2FB3o&s=EpGR3DExshuuzLLbUc2lkDN_UAuQtXEggHnq4s_I-48&e=
> .
> If they're allowed to move all over the place, then the memory cache is
> trashed. Since MPI libraries tend to set these in response to job
> schedulers and users, by default mdrun respects affinity masks if set. A
> quick test is
>
> mpirun -np 32 gmx_mpi mdrun -ntomp 1 -v -deffnm npt -pin on
>
> which directs mdrun to do something we think is good, rather than you
> working out how to do things with SLURM+MPI. Should be a dramatic
> improvement, but the hint about using fewer ranks to get more threads per
> rank is probably better still.
>
>
> Well, that did not result in any change in the speed.
>
> The best result I was able to get was with these settings:
>
> !/bin/bash
> #SBATCH -N 1 --tasks-per-node=16
> #SBATCH -t 00:30:00
> #SBATCH -p GPU_100-debug --gres=gpu:2
>
> # Setup the module command
> set echo
> set -x
>
> module load gromacs/5.1.2
>
> cd $SLURM_SUBMIT_DIR
> echo "$SLURM_NPROCS=" $SLURM_NPROCS
> mpirun -np $SLURM_NPROCS gmx_mpi mdrun -ntomp 2 -v -deffnm npt
>
> Even then, it’s slower than our local machine that has 24 cores and a
> single K20c. Is this normal?
>
> (bridges)
>                Core t (s)   Wall t (s)        (%)
>        Time:    47432.357      741.131     6400.0
>                  (ns/day)    (hour/ns)
> Performance:      116.579        0.206
>
>
>
> (local machine)
>                Core t (s)   Wall t (s)        (%)
>        Time:    12450.387      519.447     2396.9
>                  (ns/day)    (hour/ns)
> Performance:      166.331        0.144
>
>
> Best,
> Irem
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.