[gmx-users] multiple GPU on multiple nodes

Szilárd Páll pall.szilard at gmail.com
Tue Feb 4 16:33:48 CET 2014


John,

I strongly suggest that you consult the Blue Waters or other XK7
manual or talk to the support team. Understanding this hardware in
crucial in getting any reasonable performance.

As I said before, the inconsistency in your commands is that you
request nnodes x nppn = 2 x 16 MPI ranks which does match with the
"aprun -n 32" launch requirements, i.e 2x16=32. However, requesting
2x16 ranks and than running "aprun -n 2" is simply incorrect. Instead
you can do e.g.
qsub  -l nodes=2:ppn=4
aprun -n 8 mdrun_mpi -gpu_id 0000

Cheers,
--
Szilárd


On Tue, Feb 4, 2014 at 4:09 PM, cyberjhon <cyberjhon at hotmail.com> wrote:
>
> On Tue, Feb 4, 2014 at 1:51 AM, cyberjhon <cyberjhon@> wrote:
>> Dear Szilárd
>>
>> Thanks for your answer.
>>
>> To submit the job I do;
>>
>> qsub -l nodes=2:ppn=16,walltime=12:00:00
>>
>> Then, to run gromacs I can do:
>> aprun -n 1 mdrun_mpi -deffnm protein
>
>>Those qsub and aprun commands are not in like with each other. Your
>>submission requests 2 nodes and 16 ranks/processes per node, that this
>>32 ranks in total. However, your aprun lauch requests only a single
>>rank (and only two below). Please consult the machine's documentation
>>on how to launch jobs correctly.
>
> I now what you are saying, an I am agree with you. When I run gromacs using
> only CPUs I do
>  aprun -n 32 mdrun_mpi -deffnm protein
> And this works perfectly !!!
>
> But when I am trying to use the GPUs in those two nodes, if I run this
> command, I get the error that I told you before:
>
>> -------------------------------------------------------
>> Program mdrun_mpi, VERSION 4.6.2
>> Source code file:
>> /N/soft/cle4/gromacs/gromacs-4.6.2/src/gmxlib/gmx_detect_hardware.c, line:
>> 356
>>
>> Fatal error:
>> Incorrect launch configuration: mismatching number of PP MPI processes and
>> GPUs per node.
>> mdrun_mpi was started with 32 PP MPI processes per node, but only 1 GPU
>> were
>> detected.
>> For more information and tips for troubleshooting, please check the
>> GROMACS
>> website at http://www.gromacs.org/Documentation/Errors
>> -------------------------------------------------------
>
> So according to this the problem is not in the aprun command, it looks like
> the problem is in gromacs.
> In your presentation
> http://www.gromacs.org/@api/deki/files/213/=gromacs_parallelization_acceleration.pdf
> Page 49 you show that  you run gromacs in Blue waters in multiple nodes, and
> base on the date you used 4.6.2, how you did that?
>
> Thanks
>
> John Michael
>
> --
> View this message in context: http://gromacs.5086.x6.nabble.com/multiple-GPU-on-multiple-nodes-tp5014085p5014279.html
> Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list