[gmx-users] Working on a GPU cluster with GROMACS 5

Ebert Maximilian m.ebert at umontreal.ca
Wed Jan 7 15:06:57 CET 2015


Hi Jiri,

Yes this seems to be the problem. Thank you very much. The GPUs on this cluster are in the exclusive thread mode. I will ask the administrator if we can change this.

Thank you very much!

Max

-----Ursprüngliche Nachricht-----
Von: gromacs.org_gmx-users-bounces at maillist.sys.kth.se [mailto:gromacs.org_gmx-users-bounces at maillist.sys.kth.se] Im Auftrag von Jiri Kraus
Gesendet: Mittwoch, 7. Januar 2015 14:58
An: gromacs.org_gmx-users at maillist.sys.kth.se
Betreff: Re: [gmx-users] Working on a GPU cluster with GROMACS 5

Hi Max,

In which compute mode are the GPUs running? Do be able to share a GPU between multiple MPI ranks you either need to use the multi process service (MPS see: [1]) or let the GPUs run in default compute mode (see [2]). You can query the compute mode with nvidia-smi -q -d COMPUTE (see example output below) and change it with nvidia-smi -c DEFAULT. Changing the compute mode requires root.

Hope this helps

Jiri

[1] https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf
[2] http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-modes 

Example output of nvidia-smi -q -d COMPUTE:
==============NVSMI LOG==============

Timestamp                           : Wed Jan  7 05:51:31 2015
Driver Version                      : 340.32

Attached GPUs                       : 6
GPU 0000:04:00.0
    Compute Mode                    : Default

GPU 0000:05:00.0
    Compute Mode                    : Default

GPU 0000:08:00.0
    Compute Mode                    : Default

GPU 0000:09:00.0
    Compute Mode                    : Default

GPU 0000:83:00.0
    Compute Mode                    : Default

GPU 0000:84:00.0
    Compute Mode                    : Default

> Message: 5
> Date: Wed, 7 Jan 2015 13:42:46 +0000
> From: Ebert Maximilian <m.ebert at umontreal.ca>
> To: "gmx-users at gromacs.org" <gmx-users at gromacs.org>
> Subject: Re: [gmx-users] Working on a GPU cluster with GROMACS 5
> Message-ID:
> 	<CAE65A26CFD123408751DCA75C2BA164255E0839 at athens-
> cour.sim.umontreal.ca>
> 
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi Carsten,
> 
> thanks again for your reply. The why our cluster is setup is that you 
> ask for GPUs using the ppn command and not CPUs. Therefore, I put 4 
> there. But to rule out the possibility that someone is actually using 
> the note I called for 7 GPUs (so the entire note) but with GPU id just assign the first 4 to GROMACS.
> I still get the same error. I also tried -gpu_id 00 or -gpu_id 4444 to 
> change the CPU and to only use a single GPU but I always get:
> 
> NOTE: You assigned GPUs to multiple MPI processes.
> 
> -------------------------------------------------------
> Program gmx_mpi, VERSION 5.0.1
> Source code file: /RQusagers/rqchpbib/stubbsda/gromacs-
> 5.0.1/src/gromacs/gmxlib/cuda_tools/pmalloc_cuda.cu, line: 61
> 
> Fatal error:
> cudaMallocHost of size 4 bytes failed: all CUDA-capable devices are 
> busy or unavailable
> 
> For more information and tips for troubleshooting, please check the 
> GROMACS website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
> 
> Error on rank 1, will try to stop all ranks Halting parallel program 
> gmx_mpi on CPU 1 out of 4
> 
> -----Urspr?ngliche Nachricht-----
> Von: gromacs.org_gmx-users-bounces at maillist.sys.kth.se
> [mailto:gromacs.org_gmx-users-bounces at maillist.sys.kth.se] Im Auftrag 
> von Carsten Kutzner
> Gesendet: Mittwoch, 7. Januar 2015 14:13
> An: gmx-users at gromacs.org
> Betreff: Re: [gmx-users] Working on a GPU cluster with GROMACS 5
> 
> Hi Max,
> 
> On 07 Jan 2015, at 11:36, Ebert Maximilian <m.ebert at umontreal.ca> wrote:
> 
> > Hi Carsten,
> >
> > thanks for your answer. I tried what you described and it is 
> > basically
> working except for letting multiple MPI workers use one GPU. In my 
> setup I use 4 GPUs with 8 MPI workers and hence 8 CPUs and OpenMP 1.  
> This is how I start GROMACS:
> >
> > mpirun -np 8 gmx_mpi mdrun -gpu_id 00112233 -v -x -deffnm run1ns -s 
> > ../run1ns.tpr
> >
> > and I submit this using:
> >
> > qsub -q @test -lnodes=1:ppn=4 -lwalltime=1:00:00 gromacs_run_gpu
> why are you using ppn=4? Shouldn't that be 8?
> 
> >
> > Now I get the following errors (the output is longer but to keep it 
> > shorter I
> omitted the rest):
> >
> > Using 8 MPI processes
> > Using 1 OpenMP thread per MPI process
> >
> > 7 GPUs detected on host ngpu-a4-06:
> >  #0: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >  #1: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >  #2: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >  #3: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >  #4: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >  #5: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >  #6: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >
> > 4 GPUs user-selected for this run.
> > Mapping of GPUs to the 8 PP ranks in this node: #0, #0, #1, #1, #2, 
> > #2, #3, #3
> >
> > NOTE: You assigned GPUs to multiple MPI processes.
> >
> > -------------------------------------------------------
> > Program gmx_mpi, VERSION 5.0.1
> > Source code file:
> > /RQusagers/rqchpbib/stubbsda/gromacs-
> 5.0.1/src/gromacs/gmxlib/cuda_too
> > ls/pmalloc_cuda.cu, line: 61
> >
> > Fatal error:
> > cudaMallocHost of size 4 bytes failed: all CUDA-capable devices are 
> > busy or unavailable
> >
> Could it be that someone else's processes are running on that node 
> while Gromacs tries to use the GPUs? Maybe try to the the whole node, 
> maybe even in interactive mode to play around.
> 
> Carsten
> 
> > For more information and tips for troubleshooting, please check the 
> > GROMACS website at http://www.gromacs.org/Documentation/Errors
> > -------------------------------------------------------
> >
> > Error on rank 1, will try to stop all ranks Halting parallel program 
> > gmx_mpi on CPU 1 out of 8
> >
> > -------------------------------------------------------
> > Program gmx_mpi, VERSION 5.0.1
> > Source code file:
> > /RQusagers/rqchpbib/stubbsda/gromacs-
> 5.0.1/src/gromacs/gmxlib/cuda_too
> > ls/pmalloc_cuda.cu, line: 61
> >
> > Fatal error:
> > cudaMallocHost of size 4 bytes failed: all CUDA-capable devices are 
> > busy or unavailable
> >
> > For more information and tips for troubleshooting, please check the 
> > GROMACS website at http://www.gromacs.org/Documentation/Errors
> > -------------------------------------------------------
> >
> > Error on rank 3, will try to stop all ranks Halting parallel program 
> > gmx_mpi on CPU 3 out of 8
> >
> > -----Urspr?ngliche Nachricht-----
> > Von: gromacs.org_gmx-users-bounces at maillist.sys.kth.se
> > [mailto:gromacs.org_gmx-users-bounces at maillist.sys.kth.se] Im 
> > Auftrag von Carsten Kutzner
> > Gesendet: Donnerstag, 18. Dezember 2014 17:27
> > An: gmx-users at gromacs.org
> > Betreff: Re: [gmx-users] Working on a GPU cluster with GROMACS 5
> >
> > Hi Max,
> >
> > On 18 Dec 2014, at 15:30, Ebert Maximilian <m.ebert at umontreal.ca>
> wrote:
> >
> >> Dear list,
> >>
> >> I am benchmarking my system on a GPU cluster with 6 GPU's and two
> quad core CPUs for each node. First I am wondering if there is any 
> output which confirms how many CPUs and GPUs were used during the run? 
> I find the output for GPUs in the log file but only for a single node. 
> When I use multiple nodes why don't the other nodes show up in the log file as hosts?
> For instance in this example I used two nodes and claimed 4 GPUs each 
> but got this in my log file:
> >>
> >> 6 GPUs detected on host ngpu-a4-01:
> >> #0: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> >> compatible
> >> #1: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> >> compatible
> >> #2: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> >> compatible
> >> #3: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> >> compatible
> >> #4: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> >> compatible
> >> #5: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> >> compatible
> >>
> >> 4 GPUs auto-selected for this run.
> >> Mapping of GPUs to the 4 PP ranks in this node: #0, #1, #2, #3
> > This will be the same across all nodes. Gromacs will refuse to run 
> > if there
> are not enough GPUs on any of your other nodes.
> >
> >>
> >>
> >>
> >> ngpu-a4-02 is not shown here. Any idea? The job was submitted in 
> >> the
> following way:
> >>
> >> qsub -q @test -lnodes=2:ppn=4 -lwalltime=1:00:00 gromacs_run_gpu
> >>
> >> and the gromacs_run_gpu file:
> >>
> >> #!/bin/csh
> >> #
> >>
> >> #PBS -o result_run10ns96-8.dat
> >> #PBS -j oe
> >> #PBS -W umask=022
> >> #PBS -r n
> >>
> >> cd 8_gpu
> >>
> >> module add CUDA
> >> module load gromacs/5.0.1-gpu
> >>
> >> mpirun gmx_mpi mdrun -v -x -deffnm 10ns_rep1-8GPU
> >>
> >>
> >> Another question I had was how can I define the number of CPUs and
> check if they were really used?
> > Use -ntomp to control how many OpenMP threads each of your MPI
> processes will have.
> > This way you can make use of all cores you have on each node.
> >
> >> I can't find any information about the number of CPUs in the log file.
> > Look for
> > "Using . MPI processes"
> > "Using . OpenMP threads per MPI process"
> > in the log file.
> >
> >> I would also like to try combinations like 4 CPUs + 1 GPU
> > You can use the -gpu_id switch to supply a list of eligible GPUs 
> > (see mdrun -
> h).
> > If you just want to use the first GPU on you node with, e.g. 4 MPI
> processes, use -gpu_id 0000.
> >
> > Best,
> >  Carsten
> >
> >
> >
> >> or 2 CPUs + 2 GPU. How do I set this up?
> >>
> >> Thank you very much for your help,
> >>
> >> Max
> >>
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before 
> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users 
> >> or
> send a mail to gmx-users-request at gromacs.org.
> >
> >
> > --
> > Dr. Carsten Kutzner
> > Max Planck Institute for Biophysical Chemistry Theoretical and 
> > Computational Biophysics Am Fassberg 11, 37077 Goettingen, Germany 
> > Tel. +49-551-2012313, Fax: +49-551-2012302 
> > http://www.mpibpc.mpg.de/grubmueller/kutzner
> > http://www.mpibpc.mpg.de/grubmueller/sppexa
> >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before 
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users 
> > or
> send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before 
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users 
> > or
> send a mail to gmx-users-request at gromacs.org.
> 
> 
> --
> Dr. Carsten Kutzner
> Max Planck Institute for Biophysical Chemistry Theoretical and 
> Computational Biophysics Am Fassberg 11, 37077 Goettingen, Germany Tel.
> +49-551-2012313, Fax: +49-551-2012302
> http://www.mpibpc.mpg.de/grubmueller/kutzner
> http://www.mpibpc.mpg.de/grubmueller/sppexa
> 
> --
> Gromacs Users mailing list
> 
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before 
> posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or 
> send a mail to gmx-users-request at gromacs.org.
> 
> 
> ------------------------------
> 
> --
> Gromacs Users mailing list
> 
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before 
> posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or 
> send a mail to gmx-users-request at gromacs.org.
> 
> 
> End of gromacs.org_gmx-users Digest, Vol 129, Issue 21
> ******************************************************
NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361 Managing Director: Karen Theresa Burns

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain confidential information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list