[gmx-users] Pull code stalling?
Justin Lemkul
jalemkul at vt.edu
Tue Apr 8 04:21:25 CEST 2014
Hi All,
I have noticed a strange problem involving the pull code, and perhaps other
types of restraints with version 5.0-beta2. It seems that use of the pull code
causes runs to simply stall and produce no output beyond the header of the .log
file. A few notes to break down the situation:
1. I initially suspected a hardware problem, but I have determined that the
nodes in question work correctly. I have 64-CPU nodes that are handling these
jobs. Runs submitted using version 4.6.3 or 5.0-beta2 without the pull code run
correctly.
2. It seems that the runs with the pull code are stalling. Logging in to the
node where the job is running reveals that 8 CPU are being used instead of 64,
but mdrun stays stuck in uninterruptible sleep. Runs on the same nodes with
version 4.6.3 with the pull code are correctly using all 64 CPU and are
producing output at regular intervals.
3. Gromacs has been compiled with the thread-MPI library, which has worked well
for previous versions.
4. The mdrun command is simply mdrun -nt 64 -deffnm pull -px pullx.xvg -pf
pullf.xvg. Invoking mdrun -nt 64 for other runs with 5.0-beta2 without the pull
code works fine with decent performance. The problem persists with different
numbers of CPUs/threads.
I have not tried the new release candidate from today, but if that would help
narrow the problem down, I will gladly do it.
Any ideas? Sections of the .log file for a stalled run are posted below. Note,
too, that the compiler version does not affect the outcome - recompiling with
GCC 4.7.2 results in the same behavior.
As an aside, using flat-bottom restraints (separate set of jobs entirely) also
results in curious behavior. Runs proceed at a normal rate, but then take 20
minutes or more to proceed from the final step to actually writing the final
output (coordinates, checkpoint, trajectory, and energy file), then another hour
or more to actually exit. Perhaps this is an unrelated issue, but in case
there's something more globally wrong with restraints, I thought I'd mention it.
Normal position restraints work fine.
-Justin
==== .log of stall ====
GROMACS: gmx mdrun, VERSION 5.0-beta2
Executable: /home/jalemkul/software/gromacs/5.0-beta2/bin/gmx
Command line:
mdrun -nt 64 -deffnm pull -pf pullf.xvg -px pullx.xvg
Gromacs version: VERSION 5.0-beta2
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled
GPU support: disabled
invsqrt routine: gmx_software_invsqrt(x)
CPU acceleration: SSE2
FFT library: fftw-3.3.3-sse2
RDTSCP usage: enabled
C++11 compilation: disabled
TNG support: enabled
Built on: Sun Apr 6 17:12:22 EDT 2014
Built by: jalemkul at ocracoke [CMAKE]
Build OS/arch: Linux 2.6.32-5-amd64 x86_64
Build CPU vendor: AuthenticAMD
Build CPU brand: AMD Opteron(tm) Processor 6172
Build CPU family: 16 Model: 9 Stepping: 1
Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx msr
nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
C compiler: /usr/bin/cc GNU 4.4.5
C compiler flags: -msse2 -Wextra -Wno-missing-field-initializers
-Wno-sign-compare -Wall -Wno-unused -Wunused-value -Wunused-parameter
-fomit-frame-pointer -funroll-all-loops -O3 -DNDEBUG
C++ compiler: /usr/bin/c++ GNU 4.4.5
C++ compiler flags: -msse2 -Wextra -Wno-missing-field-initializers -Wall
-Wno-unused-function -fomit-frame-pointer -funroll-all-loops -O3 -DNDEBUG
Boost version: 1.48.0 (internal)
...
Initializing Domain Decomposition on 64 nodes
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
two-body bonded interactions: 0.401 nm, LJ-14, atoms 131 138
multi-body bonded interactions: 0.401 nm, Proper Dih., atoms 131 138
Minimum cell size due to bonded interactions: 0.441 nm
Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.768 nm
Estimated maximum distance required for P-LINCS: 0.768 nm
This distance will limit the DD cell size, you can override this with -rcon
Guess for relative PME load: 0.12
Will use 56 particle-particle and 8 PME only nodes
This is a guess, check the performance at the end of the log file
Using 8 separate PME nodes, as guessed by mdrun
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 56 cells with a minimum initial size of 0.960 nm
The maximum allowed number of cells is: X 17 Y 7 Z 7
Domain decomposition grid 8 x 7 x 1, separate PME nodes 8
PME domain decomposition: 8 x 1 x 1
Interleaving PP and PME nodes
This is a particle-particle only node
Domain decomposition nodeid 0, coordinates 0 0 0
Using 64 MPI threads
Using 1 OpenMP thread per tMPI thread
Detecting CPU-specific acceleration.
Present hardware specification:
Vendor: AuthenticAMD
Brand: AMD Opteron(TM) Processor 6276
Family: 21 Model: 1 Stepping: 2
Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm misalignsse mmx msr
nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a sse4.1 sse4.2
ssse3 xop
Acceleration most likely to fit this hardware: AVX_128_FMA
Acceleration selected at GROMACS compile time: SSE2
--
==================================================
Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow
Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 601
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201
jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul
==================================================
More information about the gromacs.org_gmx-users
mailing list