[gmx-users] multiple GPU on multiple nodes

Szilárd Páll pall.szilard at gmail.com
Tue Feb 4 12:20:07 CET 2014


On Tue, Feb 4, 2014 at 2:31 AM, Mark Abraham <mark.j.abraham at gmail.com> wrote:
> On Tue, Feb 4, 2014 at 1:51 AM, cyberjhon <cyberjhon at hotmail.com> wrote:
>
>> Dear Szilárd
>>
>> Thanks for your answer.
>>
>> To submit the job I do;
>>
>> qsub -l nodes=2:ppn=16,walltime=12:00:00
>>
>> Then, to run gromacs I can do:
>> aprun -n 1 mdrun_mpi -deffnm protein
>>
>> And, I get the message that you mention, which is good
>> "1 GPU detected on host nid00900:" does not mean that only one GPU was
>> detected."
>>
>> But If I do:
>> aprun -n 2 mdrun_mpi -deffnm protein
>>
>
> Here you are apparently starting two MPI ranks per node, not two total
> ranks!
>
>
>> I get the error mentioned before
>> -------------------------------------------------------
>> Program mdrun_mpi, VERSION 4.6.2
>> Source code file:
>> /N/soft/cle4/gromacs/gromacs-4.6.2/src/gmxlib/gmx_detect_hardware.c, line:
>> 356
>>
>> Fatal error:
>> Incorrect launch configuration: mismatching number of PP MPI processes and
>> GPUs per node.
>> mdrun_mpi was started with 2 PP MPI processes per node,
>>
>
> mdrun can see two ranks on this node, which is apparently not what you are
> trying to do (1 rank on each node, each using 1 GPU)
>
>
>> but only 1 GPU were
>> detected.
>> For more information and tips for troubleshooting, please check the GROMACS
>> website at http://www.gromacs.org/Documentation/Errors
>> -------------------------------------------------------
>>
>> Even if I use -gpu_id 0 or -gpu_id 00, they do not work
>>
>
> They don't matter, because 4.6.2 can't share a single GPU on a node between
> two MPI ranks on that node. More recent 4.6.x can do so, but this is not
> your problem.

That is not the case! mdrun has always supported sharing GPUs among
ranks using MPI, only with thread-MPI was this impossible until
recently! A not was issued in earlier 4.6.x versions as initially we
thought this is rarely needed - which turned out to be false!

Here are the relevant lines of the code in the first 4.6 release:
http://redmine.gromacs.org/projects/gromacs/repository/revisions/v4.6/entry/src/gmxlib/gmx_detect_hardware.c#L398

Cheers,
Sz.

>
> Mark
>
>
>>
>> I also tried what you told me before about CRAY_CUDA_MPS  and
>> CRAY_CUDA_PROXY
>> export CRAY_CUDA_MPS= 1
>> export CRAY_CUDA_PROXY=1
>>
>> I executed these commands before the aprun, and they did not work
>>
>> So, can you tell me what is the command that you use on BW to use the GPU
>> located in each node?
>>
>> Thanks
>>
>> John Michael
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://gromacs.5086.x6.nabble.com/multiple-GPU-on-multiple-nodes-tp5014085p5014254.html
>> Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list