[gmx-users] How to initiate parallel run on GPU cluster
Venkat Reddy
venkat4bt at gmail.com
Thu Apr 7 15:39:21 CEST 2016
Thank you Mark for the suggestion. I will convey the message to our system
admin. to solve the issue. thanks again for the valuable inputs.
On Thu, Apr 7, 2016 at 7:02 PM, Mark Abraham <mark.j.abraham at gmail.com>
wrote:
> Hi,
>
> As you can see in that log file, you are only getting a single rank, so
> that's all that GROMACS can use. You need to troubleshoot your use of PBS
> and mpiexec so that you get four ranks placed two on each node. We can't
> read your cluster's docs or talk to your sysadmins :-)
>
> (And while you're at it, get them to compile with something less
> prehistoric than gcc 4.4. For many runs, you're likely to be bound by CPU
> performance with two K40s per node, and more recent compilers will be
> better...)
>
> Mark
>
> On Thu, Apr 7, 2016 at 3:16 PM Venkat Reddy <venkat4bt at gmail.com> wrote:
>
> > Dear Szilárd Páll,
> > Thanks for the response.
> >
> > My PBS script to launch the run is:
> >
> > #! /bin/bash
> > #PBS -l cput=5000:00:00
> > #PBS -l select=2:ncpus=16:ngpus=2
> > #PBS -e errorfile.err
> > #PBS -o logfile.log
> > tpdir=`echo $PBS_JOBID | cut -f 1 -d .`
> > tempdir=$HOME/work/job$tpdir
> > mkdir -p $tempdir
> > cd $tempdir
> > cp -R $PBS_O_WORKDIR/* .
> > mpiexec.hydra -np 4 -hostfile $PBS_NODEFILE /Apps/gromacs512/bin/gmx_mpi
> > mdrun -v -dlb yes -ntomp 16 -s equilibration3.tpr
> >
> > Interestingly, I am using the same script to run CPU only jobs, which are
> > not creating any problems.
> >
> > Please check the generated log file here:
> > https://www.dropbox.com/s/dtfsuh6dv635n6q/md.log?dl=0
> >
> >
> >
> >
> > On Thu, Apr 7, 2016 at 6:18 PM, Szilárd Páll <pall.szilard at gmail.com>
> > wrote:
> >
> > > On Thu, Apr 7, 2016 at 2:35 PM, Venkat Reddy <venkat4bt at gmail.com>
> > wrote:
> > > > Thank you Mark for the quick response.
> > > > I tried to change -np option to 4. But it seems that mdrun is using
> > only
> > > > one GPU in single node with four ranks. The nvidia-smi command shows
> > > >
> > > > | 0 7977 C /Apps/gromacs512/bin/gmx_mpi
> > > > 130MiB |
> > > > | 0 7978 C /Apps/gromacs512/bin/gmx_mpi
> > > > 130MiB |
> > > > | 0 7979 C /Apps/gromacs512/bin/gmx_mpi
> > > > 130MiB |
> > > > | 0 7980 C /Apps/gromacs512/bin/gmx_mpi
> > > > 130MiB |
> > >
> > > No command line, no log file shown nothing to comment on.
> > >
> > > Additionally, if all four ranks you requested are on the same node
> > > rather than split over two nodes, that likely means you're using an
> > > incorrect job script -- definitely not a GROMACS issue. Please make
> > > sure you can launch an MPI "Hello world" program over multiple nodes
> > > first.
> > >
> > >
> > > > Also the job folder has four backed up copies of same run.
> > > >
> > > > On Thu, Apr 7, 2016 at 5:07 PM, Mark Abraham <
> mark.j.abraham at gmail.com
> > >
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> mpiexec.hydra -np 1 asks for a single MPI rank, which is what you
> got.
> > > But
> > > >> you need at least two, ie. at least one on each rank, and at least
> > four
> > > if
> > > >> you want to make use of the two GPUs on each of two nodes.
> > > >>
> > > >> Mark
> > > >>
> > > >>
> > > >> On Thu, Apr 7, 2016 at 1:14 PM Venkat Reddy <venkat4bt at gmail.com>
> > > wrote:
> > > >>
> > > >> > Dear all,
> > > >> >
> > > >> > Please neglect my previous mail which was incomplete.
> > > >> >
> > > >> > I am trying to execute mdrun our GPU cluster with 7 nodes where
> each
> > > node
> > > >> > is populated by 16 processors and two K40 GPU cards. I have no
> > problem
> > > >> with
> > > >> > mdrun on single node. However, when I try to execute parallel run
> on
> > > two
> > > >> > nodes with gmx_mpi executable (gromacs-5.1.2), the performance is
> > > very
> > > >> > slow. When I logged into individual nodes, I found that mdrun is
> not
> > > >> > utilizing both GPUs. The generated log file shows the following
> > > message.
> > > >> >
> > > >> > Using 1 MPI process
> > > >> > Using 16 OpenMP threads
> > > >> >
> > > >> > 2 compatible GPUs are present, with IDs 0,1
> > > >> > 1 GPU auto-selected for this run.
> > > >> > Mapping of GPU ID to the 1 PP rank in this node: 0
> > > >> >
> > > >> >
> > > >> > NOTE: potentially sub-optimal launch configuration, gmx_mpi
> started
> > > with
> > > >> > less
> > > >> > PP MPI process per node than GPUs available.
> > > >> > Each PP MPI process can use only one GPU, 1 GPU per node
> will
> > be
> > > >> > used.
> > > >> >
> > > >> > I read the manual and instructions in
> > > >> >
> > > >> >
> > > >>
> > >
> >
> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
> > > >> > to
> > > >> > execute the parallel run. But I couldn't find the right flags to
> > > initiate
> > > >> > it. Please help me in this aspect. The script I used to execute
> the
> > > >> > parallel run is given below.
> > > >> >
> > > >> > #! /bin/bash
> > > >> > #PBS -l cput=5000:00:00
> > > >> > #PBS -l select=2:ncpus=16:ngpus=2
> > > >> > #PBS -e errorfile.err
> > > >> > #PBS -o logfile.log
> > > >> > tpdir=`echo $PBS_JOBID | cut -f 1 -d .`
> > > >> > tempdir=$HOME/work/job$tpdir
> > > >> > mkdir -p $tempdir
> > > >> > cd $tempdir
> > > >> > cp -R $PBS_O_WORKDIR/* .
> > > >> > mpiexec.hydra -np 1 -hostfile $PBS_NODEFILE
> > > /Apps/gromacs512/bin/gmx_mpi
> > > >> > mdrun -v -dlb yes -ntomp 16 -s equilibration3.tpr
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > With Best Wishes
> > > >> > Venkat Reddy Chirasani
> > > >> > PhD student
> > > >> > Laboratory of Computational Biophysics
> > > >> > Department of Biotechnology
> > > >> > IIT Madras
> > > >> > Chennai
> > > >> > INDIA-600036
> > > >> > --
> > > >> > Gromacs Users mailing list
> > > >> >
> > > >> > * Please search the archive at
> > > >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> before
> > > >> > posting!
> > > >> >
> > > >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >> >
> > > >> > * For (un)subscribe requests visit
> > > >> >
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > or
> > > >> > send a mail to gmx-users-request at gromacs.org.
> > > >> >
> > > >> --
> > > >> Gromacs Users mailing list
> > > >>
> > > >> * Please search the archive at
> > > >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > >> posting!
> > > >>
> > > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >>
> > > >> * For (un)subscribe requests visit
> > > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > >> send a mail to gmx-users-request at gromacs.org.
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > With Best Wishes
> > > > Venkat Reddy Chirasani
> > > > PhD student
> > > > Laboratory of Computational Biophysics
> > > > Department of Biotechnology
> > > > IIT Madras
> > > > Chennai
> > > > INDIA-600036
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > send a mail to gmx-users-request at gromacs.org.
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> >
> >
> >
> > --
> > With Best Wishes
> > Venkat Reddy Chirasani
> > PhD student
> > Laboratory of Computational Biophysics
> > Department of Biotechnology
> > IIT Madras
> > Chennai
> > INDIA-600036
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
--
With Best Wishes
Venkat Reddy Chirasani
PhD student
Laboratory of Computational Biophysics
Department of Biotechnology
IIT Madras
Chennai
INDIA-600036
More information about the gromacs.org_gmx-users
mailing list