[gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration

Wed Jun 5 16:35:26 CEST 2013

Just to wrap up this thread, it does work when the mpirun is properly
configured. I knew it had to be my fault :)

Something like this works like a charm:
mpirun -npernode 2 mdrun_mpi -ntomp 8 -gpu_id 01 -deffnm md -v

Thank you Mark and Szilárd for your invaluable expertise.

Best regards,
João Henriques

On Wed, Jun 5, 2013 at 4:21 PM, João Henriques <
joao.henriques.32353 at gmail.com> wrote:

> Ok, thanks once again. I will do my best to overcome this issue.
>
> Best regards,
> João Henriques
>
>
> On Wed, Jun 5, 2013 at 3:33 PM, Mark Abraham <mark.j.abraham at gmail.com>wrote:
>
>> On Wed, Jun 5, 2013 at 2:53 PM, João Henriques <
>> joao.henriques.32353 at gmail.com> wrote:
>>
>> > Sorry to keep bugging you guys, but even after considering all you
>> > suggested and reading the bugzilla thread Mark pointed out, I'm still
>> > unable to make the simulation run over multiple nodes.
>> > *Here is a template of a simple submission over 2 nodes:*
>> >
>> > --- START ---
>> > #!/bin/sh
>> > #
>> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> > #
>> > # Job name
>> > #SBATCH -J md
>> > #
>> > # No. of nodes and no. of processors per node
>> > #SBATCH -N 2
>> > #SBATCH --exclusive
>> > #
>> > # Time needed to complete the job
>> > #SBATCH -t 48:00:00
>> > #
>> > # Add modules
>> > module load gcc/4.6.3
>> > module load openmpi/1.6.3/gcc/4.6.3
>> > module load cuda/5.0
>> > module load gromacs/4.6
>> > #
>> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> > #
>> > grompp -f md.mdp -c npt.gro -t npt.cpt -p topol -o md.tpr
>> > mpirun -np 4 mdrun_mpi -gpu_id 01 -deffnm md -v
>> > #
>> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> > --- END ---
>> >
>> > *Here is an extract of the md.log:*
>> >
>> > --- START ---
>> > Using 4 MPI processes
>> > Using 4 OpenMP threads per MPI process
>> >
>> > Detecting CPU-specific acceleration.
>> > Present hardware specification:
>> > Vendor: GenuineIntel
>> > Brand:  Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>> > Family:  6  Model: 45  Stepping:  7
>> > Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
>> nonstop_tsc
>> > pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2
>> ssse3
>> > tdt x2apic
>> > Acceleration most likely to fit this hardware: AVX_256
>> > Acceleration selected at GROMACS compile time: AVX_256
>> >
>> >
>> > 2 GPUs detected on host en001:
>> >   #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>> >   #1: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>> >
>> >
>> > -------------------------------------------------------
>> > Program mdrun_mpi, VERSION 4.6
>> > Source code file:
>> >
>> /lunarc/sw/erik/src/gromacs/gromacs-4.6/src/gmxlib/gmx_detect_hardware.c,
>> > line: 322
>> >
>> > Fatal error:
>> > Incorrect launch configuration: mismatching number of PP MPI processes
>> and
>> > GPUs per node.
>> >
>>
>> "per node" is critical here.
>>
>>
>> > mdrun_mpi was started with 4 PP MPI processes per node, but you
>> provided 2
>> > GPUs.
>> >
>>
>> ...and here. As far as mdrun_mpi knows from the MPI system there's only
>> MPI
>> ranks on this one node.
>>
>> For more information and tips for troubleshooting, please check the
>> GROMACS
>> > website at http://www.gromacs.org/Documentation/Errors
>> > -------------------------------------------------------
>> > --- END ---
>> >
>> > As you can see, gmx is having trouble understanding that there's a
>> second
>> > node available. Note that since I did not specify -ntomp, it assigned 4
>> > threads to each of the 4 mpi processes (filling the entire avail. 16
>> CPUs
>> > *on
>> > one node*).
>> > For the same exact submission, if I do set "-ntomp 8" (since I have 4
>> MPI
>> > procs * 8 OpenMP threads = 32 CPUs total on the 2 nodes) I get a warning
>> > telling me that I'm hyperthreading, which can only mean that *gmx is
>> > assigning all processes to the first node once again.*
>> > Am I doing something wrong or is there some problem with gmx-4.6? I
>> guess
>> > it can only be my fault, since I've never seen anyone else complaining
>> > about the same issue here.
>> >
>>
>> Assigning MPI processes to nodes is a matter configuring your MPI. GROMACS
>> just follows the MPI system information it gets from MPI - hence the
>> oversubscription. If you assign two MPI processes to each node, then
>> things
>> should work.
>>
>> Mark
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>
>
>
> --
> João Henriques
>

-- 
João Henriques