[gmx-users] Problems running on multiple nodes

Mon Oct 23 23:14:49 CEST 2017

Hi,

As the message says, you have a mismatch between the CUDA infrastructure on
your cluster. Check your documentation for how to use GPUs.

Also, you can't run across multiple nodes unless you compile GROMACS with
support for MPI and run with e.g. mpirun. See the installation guide for
details.

Mark

On Mon, Oct 23, 2017 at 11:08 PM Thanh Le <thanh.q.le at sjsu.edu> wrote:

> Hi everyone,
>
> This is my first time running Gromacs using multiple nodes. Currently, I
> don’t quite understand the output generated by my run. Can you please take
> a look at the script and output and tell me how to improve?
>
> The HPC I am currently using has 72 nodes; each node has 28 CPUs.
>
> The script is:
>
> #!/bin/bash
>
> #SBATCH --job-name=Gromacs78
>
> #SBATCH -o Gromacs_result.out
>
> #SBATCH -n 140 -N 5
>
> #SBATCH --tasks-per-node=28
>
>
>
> module purge
>
> module load gromacs-mvapich2-2.2 mvapich2-2.2/gnu-4.8.5
>
> source /opt/gromacs/bin/GMXRC
>
> dm=/home/blustig/perl5/simulation/78
>
> dmdp=${dm}/mdpfiles
>
> vt=rna-protein
>
> dw=${dm}/${vt}
>
> mkdir ${dw}
>
> cd ${dw}
>
> ########### produce 100ns mdrun: 1st trajectory
>
> echo "0" > inputall
>
> trj=1
>
> let tm=trj*20
>
> vp=md_npt
>
> gmx trjconv -s md_npt.tpr -f md_npt.xtc -pbc mol -ur compact -o
> md_npt_trj20ps.gro < inputall
>
> gmx grompp -f ${dmdp}/md.mdp -c md_npt.gro -t md_npt.cpt -p
> rna-protein.top -n rna-protein.ndx -o md1micros.tpr -maxwarn 1
>
> gmx mdrun -ntmpi 140 -pin on -s md1micros.tpr -o md1micros.trr -e
> md1micros.edr -g md1micros.log -c md1micros.gro -x md1micros.xtc -cpo
> md1micros.cpt
>
>
>
> The output is:
>
> Back Off! I just backed up md1micros.log to ./#md1micros.log.14#
>
> NOTE: Error occurred during GPU detection:
>
>       CUDA driver version is insufficient for CUDA runtime version
>
>       Can not use GPU acceleration, will fall back to CPU kernels.
>
> Running on 1 node with total 28 cores, 28 logical cores, 0 compatible GPUs
>
> Hardware detected:
>
>   CPU info:
>
>     Vendor: Intel
>
>     Brand:  Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
>
>     SIMD instructions most likely to fit this hardware: AVX2_256
>
>     SIMD instructions selected at GROMACS compile time: AVX2_256
>
>   Hardware topology: Basic
>
> Reading file md1micros.tpr, VERSION 2016.3 (single precision)
>
> Changing nstlist from 10 to 25, rlist from 1.4 to 1.435
>
> Will use 120 particle-particle and 20 PME only ranks
>
> This is a guess, check the performance at the end of the log file
>
> Using 140 MPI threads
>
> Using 1 OpenMP thread per tMPI thread
>
> NOTE: Oversubscribing a CPU, will not pin threads.
>
> NOTE: Thread affinity setting failed. This can cause performance
> degradation.
>
>       If you think your settings are correct, ask on the gmx-users list.
>
> Back Off! I just backed up md1micros.xtc to ./#md1micros.xtc.12#
>
> Back Off! I just backed up md1micros.edr to ./#md1micros.edr.12#
>
> WARNING: This run will generate roughly 12227 Mb of data
>
> starting mdrun 'Protein in water'
>
> 500000000 steps, 1000000.0 ps.
>
> step 87500 Turning on dynamic load balancing, because the performance loss
> due to load imbalance is 2.2 %.
>
>
>
> I don’t understand why it is taking quite a long time to run.
>
> Any advice is greatly appreciated.
>
> Thanks,
>
> Thanh Le
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.