[gmx-users] How to initiate parallel run on GPU cluster

Thu Apr 7 15:32:58 CEST 2016

Hi,

As you can see in that log file, you are only getting a single rank, so
that's all that GROMACS can use. You need to troubleshoot your use of PBS
and mpiexec so that you get four ranks placed two on each node. We can't
read your cluster's docs or talk to your sysadmins :-)

(And while you're at it, get them to compile with something less
prehistoric than gcc 4.4. For many runs, you're likely to be bound by CPU
performance with two K40s per node, and more recent compilers will be
better...)

Mark

On Thu, Apr 7, 2016 at 3:16 PM Venkat Reddy <venkat4bt at gmail.com> wrote:

> Dear Szilárd Páll,
> Thanks for the response.
>
> My PBS script to launch the run is:
>
> #! /bin/bash
> #PBS -l cput=5000:00:00
> #PBS -l select=2:ncpus=16:ngpus=2
> #PBS -e errorfile.err
> #PBS -o logfile.log
> tpdir=`echo $PBS_JOBID | cut -f 1 -d .`
> tempdir=$HOME/work/job$tpdir
> mkdir -p $tempdir
> cd $tempdir
> cp -R $PBS_O_WORKDIR/* .
> mpiexec.hydra -np 4 -hostfile $PBS_NODEFILE /Apps/gromacs512/bin/gmx_mpi
> mdrun -v -dlb yes  -ntomp 16 -s equilibration3.tpr
>
> Interestingly, I am using the same script to run CPU only jobs, which are
> not creating any problems.
>
> Please check the generated log file here:
> https://www.dropbox.com/s/dtfsuh6dv635n6q/md.log?dl=0
>
>
>
>
> On Thu, Apr 7, 2016 at 6:18 PM, Szilárd Páll <pall.szilard at gmail.com>
> wrote:
>
> > On Thu, Apr 7, 2016 at 2:35 PM, Venkat Reddy <venkat4bt at gmail.com>
> wrote:
> > > Thank you Mark for the quick response.
> > > I tried to change -np option to 4. But it seems that mdrun is using
> only
> > > one GPU in single node with four ranks. The nvidia-smi command shows
> > >
> > > |    0      7977    C   /Apps/gromacs512/bin/gmx_mpi
> > > 130MiB |
> > > |    0      7978    C   /Apps/gromacs512/bin/gmx_mpi
> > > 130MiB |
> > > |    0      7979    C   /Apps/gromacs512/bin/gmx_mpi
> > > 130MiB |
> > > |    0      7980    C   /Apps/gromacs512/bin/gmx_mpi
> > > 130MiB |
> >
> > No command line, no log file shown nothing to comment on.
> >
> > Additionally, if all four ranks you requested are on the same node
> > rather than split over two nodes, that likely means you're using an
> > incorrect job script -- definitely not a GROMACS issue. Please make
> > sure you can launch an MPI "Hello world" program over multiple nodes
> > first.
> >
> >
> > > Also the job folder has four backed up copies of same run.
> > >
> > > On Thu, Apr 7, 2016 at 5:07 PM, Mark Abraham <mark.j.abraham at gmail.com
> >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> mpiexec.hydra -np 1 asks for a single MPI rank, which is what you got.
> > But
> > >> you need at least two, ie. at least one on each rank, and at least
> four
> > if
> > >> you want to make use of the two GPUs on each of two nodes.
> > >>
> > >> Mark
> > >>
> > >>
> > >> On Thu, Apr 7, 2016 at 1:14 PM Venkat Reddy <venkat4bt at gmail.com>
> > wrote:
> > >>
> > >> > Dear all,
> > >> >
> > >> > Please neglect my previous mail which was incomplete.
> > >> >
> > >> > I am trying to execute mdrun our GPU cluster with 7 nodes where each
> > node
> > >> > is populated by 16 processors and two K40 GPU cards. I have no
> problem
> > >> with
> > >> > mdrun on single node. However, when I try to execute parallel run on
> > two
> > >> > nodes with gmx_mpi executable (gromacs-5.1.2),  the performance is
> > very
> > >> > slow. When I logged into individual nodes, I found that mdrun is not
> > >> > utilizing both GPUs. The generated log file shows the following
> > message.
> > >> >
> > >> > Using 1 MPI process
> > >> > Using 16 OpenMP threads
> > >> >
> > >> > 2 compatible GPUs are present, with IDs 0,1
> > >> > 1 GPU auto-selected for this run.
> > >> > Mapping of GPU ID to the 1 PP rank in this node: 0
> > >> >
> > >> >
> > >> > NOTE: potentially sub-optimal launch configuration, gmx_mpi started
> > with
> > >> > less
> > >> >       PP MPI process per node than GPUs available.
> > >> >       Each PP MPI process can use only one GPU, 1 GPU per node will
> be
> > >> > used.
> > >> >
> > >> > I read the manual and instructions in
> > >> >
> > >> >
> > >>
> >
> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
> > >> > to
> > >> > execute the parallel run. But I couldn't find the right flags to
> > initiate
> > >> > it. Please help me in this aspect. The script I used to execute the
> > >> > parallel run is given below.
> > >> >
> > >> > #! /bin/bash
> > >> > #PBS -l cput=5000:00:00
> > >> > #PBS -l select=2:ncpus=16:ngpus=2
> > >> > #PBS -e errorfile.err
> > >> > #PBS -o logfile.log
> > >> > tpdir=`echo $PBS_JOBID | cut -f 1 -d .`
> > >> > tempdir=$HOME/work/job$tpdir
> > >> > mkdir -p $tempdir
> > >> > cd $tempdir
> > >> > cp -R $PBS_O_WORKDIR/* .
> > >> > mpiexec.hydra -np 1 -hostfile $PBS_NODEFILE
> > /Apps/gromacs512/bin/gmx_mpi
> > >> > mdrun -v -dlb yes  -ntomp 16 -s equilibration3.tpr
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > With Best Wishes
> > >> > Venkat Reddy Chirasani
> > >> > PhD student
> > >> > Laboratory of Computational Biophysics
> > >> > Department of Biotechnology
> > >> > IIT Madras
> > >> > Chennai
> > >> > INDIA-600036
> > >> > --
> > >> > Gromacs Users mailing list
> > >> >
> > >> > * Please search the archive at
> > >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > >> > posting!
> > >> >
> > >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >> >
> > >> > * For (un)subscribe requests visit
> > >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > >> > send a mail to gmx-users-request at gromacs.org.
> > >> >
> > >> --
> > >> Gromacs Users mailing list
> > >>
> > >> * Please search the archive at
> > >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > >> posting!
> > >>
> > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >>
> > >> * For (un)subscribe requests visit
> > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > >> send a mail to gmx-users-request at gromacs.org.
> > >>
> > >
> > >
> > >
> > > --
> > > With Best Wishes
> > > Venkat Reddy Chirasani
> > > PhD student
> > > Laboratory of Computational Biophysics
> > > Department of Biotechnology
> > > IIT Madras
> > > Chennai
> > > INDIA-600036
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
>
>
>
> --
> With Best Wishes
> Venkat Reddy Chirasani
> PhD student
> Laboratory of Computational Biophysics
> Department of Biotechnology
> IIT Madras
> Chennai
> INDIA-600036
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.