[gmx-users] Multi-level parallelization: MPI + OpenMP
Éric Germaneau
germaneau at sjtu.edu.cn
Fri Jul 19 08:38:14 CEST 2013
I actually submitted using two MPI process per node but log files do
not get updated, it's like the calculation gets stuck.
Here is how I proceed:
mpirun -np $NM -machinefile nodegpu mdrun_mpi -nb gpu -v -deffnm
test184000atoms_verlet.tpr >& mdrun_mpi.log
with the content of /nodegpu/:
gpu04
gpu04
gpu11
gpu11
and with
NM=`cat nodegpu | wc -l`
/bjobs/ gives
3983 hpceric RUN gpu mu05 16*gpu11 gromacs Jul 19
12:12
16*gpu04
/mdrun_mpi.log/ contains the description of the options and
/test184000atoms_verlet.tpr.log/ stops after "PLEASE READ AND CITE THE
FOLLOWING REFERENCE".
The top of /test184000atoms_verlet.tpr.log/ is:
Log file opened on Fri Jul 19 13:47:36 2013
Host: gpu11 pid: 124677 nodeid: 0 nnodes: 4
Gromacs version: VERSION 4.6.3
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled
GPU support: enabled
invsqrt routine: gmx_software_invsqrt(x)
CPU acceleration: AVX_256
FFT library: fftw-3.3.3-sse2-avx
Large file support: enabled
RDTSCP usage: enabled
Built on: Mon Jul 15 13:44:42 CST 2013
Built by: name at node [CMAKE]
Build OS/arch: Linux 2.6.32-279.el6.x86_64 x86_64
Build CPU vendor: GenuineIntel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Build CPU family: 6 Model: 45 Stepping: 7
Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx
msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2
sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /lustre/utility/intel/impi/4.1.1.036/intel64/bin/mpicc
GNU gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
C compiler flags: -mavx -Wextra -Wno-missing-field-initializers
-Wno-sign-compare -Wall -Wno-unused -Wunused-value
-fomit-frame-pointer -funroll-all-loops -O3 -DNDEBUG
C++ compiler:
/lustre/utility/intel/impi/4.1.1.036/intel64/bin/mpicxx GNU g++
(GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers
-Wno-sign-compare -Wall -Wno-unused -Wunused-value
-fomit-frame-pointer -funroll-all-loops -O3 -DNDEBUG
CUDA compiler: /lustre/utility/cuda-5.0/bin/nvcc nvcc: NVIDIA
(R) Cuda compiler driver;Copyright (c) 2005-2012 NVIDIA
Corporation;Built on Fri_Sep_21_17:28:58_PDT_2012;Cuda compilation
tools, release 5.0, V0.2.1221
CUDA compiler
flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_30,code=compute_30;-use_fast_math;-Xcompiler;-fPIC
;
-mavx;-Wextra;-Wno-missing-field-initializers;-Wno-sign-compare;-Wall;-Wno-unused;-Wunused-value;-fomit-frame-pointer;-funroll-all-loops;-O3;-DNDEBUG
CUDA driver: 5.0
CUDA runtime: 5.0
Does any have any idea about what's going wrong here?
Thanks,
Éric.
On 07/19/2013 09:35 AM, Éric Germaneau wrote:
> Dear all,
>
> I'm note a gromacs user, I've installed gromacs 4.6.3 on our cluster
> and making some test.
> Each node of our machine has 16 cores and 2 GPU.
> I'm trying to figure how to submit efficient multiple nodes LSF jobs
> using the maximum of resources.
> After reading the documentation
> <http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Locking_threads_to_physical_cores>
> on "Acceleration and parallelization" I got confused and inquire some
> help.
> I'm just wondering whether someone with some experiences on this matter.
> I thank you in advance,
>
> Éric.
>
--
/Be the change you wish to see in the world
/ --- Mahatma Gandhi ---
Éric Germaneau <http://hpc.sjtu.edu.cn/index.htm>
Shanghai Jiao Tong University
Network & Information Center
room 205
Minhang Campus
800 Dongchuan Road
Shanghai 200240
China
View Éric Germaneau's profile on LinkedIn
<http://cn.linkedin.com/pub/%C3%A9ric-germaneau/30/931/986>
/Please, if possible, don't send me MS Word or PowerPoint attachments
Why? See: http://www.gnu.org/philosophy/no-word-attachments.html/
More information about the gromacs.org_gmx-users
mailing list