[gmx-users] Multi-level parallelization: MPI + OpenMP

Éric Germaneau germaneau at sjtu.edu.cn
Fri Jul 19 08:38:14 CEST 2013

I actually submitted  using two MPI process per node but log files do 
not get updated, it's like the calculation gets stuck.

Here is how I proceed:

    mpirun -np $NM -machinefile nodegpu mdrun_mpi  -nb gpu -v -deffnm
    test184000atoms_verlet.tpr >& mdrun_mpi.log

with the content of /nodegpu/:


and with

    NM=`cat nodegpu | wc -l`

/bjobs/ gives

    3983    hpceric RUN   gpu mu05        16*gpu11    gromacs    Jul 19

/mdrun_mpi.log/ contains the description of the options and 
/test184000atoms_verlet.tpr.log/ stops after "PLEASE READ AND CITE THE 

The top of /test184000atoms_verlet.tpr.log/ is:

    Log file opened on Fri Jul 19 13:47:36 2013
    Host: gpu11  pid: 124677  nodeid: 0  nnodes:  4
    Gromacs version:    VERSION 4.6.3
    Precision:          single
    Memory model:       64 bit
    MPI library:        MPI
    OpenMP support:     enabled
    GPU support:        enabled
    invsqrt routine:    gmx_software_invsqrt(x)
    CPU acceleration:   AVX_256
    FFT library:        fftw-3.3.3-sse2-avx
    Large file support: enabled
    RDTSCP usage:       enabled
    Built on:           Mon Jul 15 13:44:42 CST 2013
    Built by:           name at node [CMAKE]
    Build OS/arch:      Linux 2.6.32-279.el6.x86_64 x86_64
    Build CPU vendor:   GenuineIntel
    Build CPU brand:    Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
    Build CPU family:   6   Model: 45   Stepping: 7
    Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx
    msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2
    sse3 sse4.1 sse4.2 ssse3 tdt x2apic
    C compiler: /lustre/utility/intel/impi/
    GNU gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
    C compiler flags:   -mavx    -Wextra -Wno-missing-field-initializers
    -Wno-sign-compare -Wall -Wno-unused -Wunused-value  
    -fomit-frame-pointer -funroll-all-loops  -O3 -DNDEBUG
    C++ compiler:
    /lustre/utility/intel/impi/ GNU g++
    (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
    C++ compiler flags: -mavx   -Wextra -Wno-missing-field-initializers
    -Wno-sign-compare -Wall -Wno-unused -Wunused-value  
    -fomit-frame-pointer -funroll-all-loops  -O3 -DNDEBUG
    CUDA compiler:      /lustre/utility/cuda-5.0/bin/nvcc nvcc: NVIDIA
    (R) Cuda compiler driver;Copyright (c) 2005-2012 NVIDIA
    Corporation;Built on Fri_Sep_21_17:28:58_PDT_2012;Cuda compilation
    tools, release 5.0, V0.2.1221
    CUDA compiler
    CUDA driver:        5.0
    CUDA runtime:       5.0

Does any have any idea about what's going wrong here?


On 07/19/2013 09:35 AM, Éric Germaneau wrote:
> Dear all,
> I'm note a gromacs user,  I've installed gromacs 4.6.3 on our cluster 
> and making some test.
> Each node of our machine has 16 cores and 2 GPU.
> I'm trying to figure how to submit efficient multiple nodes LSF jobs 
> using the maximum of resources.
> After reading the documentation 
> <http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Locking_threads_to_physical_cores> 
> on "Acceleration and parallelization" I got confused and inquire some 
> help.
> I'm just wondering whether someone with some experiences on this matter.
> I thank you in advance,
>                                                 Éric.

/Be the change you wish to see in the world
/ --- Mahatma Gandhi ---

Éric Germaneau <http://hpc.sjtu.edu.cn/index.htm>

Shanghai Jiao Tong University
Network & Information Center
room 205
Minhang Campus
800 Dongchuan Road
Shanghai 200240

View Éric Germaneau's profile on LinkedIn 

/Please, if possible, don't send me MS Word or PowerPoint attachments
Why? See: http://www.gnu.org/philosophy/no-word-attachments.html/

More information about the gromacs.org_gmx-users mailing list