[gmx-users] Errors on running simulation in Cluster

Mon Jun 27 21:43:19 CEST 2016

Dear Gromacacs users,

I have been doing simulation of protein in a membrane in Coarse Grained
Model
with total number of atom 13372 where around  14350 water molecules. I have
run the
simulation for 100ns with dt = 0.02 and nsteps =5000000 and is OK running
in a
single computer of 16 cores and single GPU. But I got stopped after 20
second from strat and got following error while
runnig in a cluster with multiple GPU keeping everything same.

GROMACS:      gmx mdrun, VERSION 5.1.2
Executable:   /sharedapps/ICC/15.2/gromacs/5.1.2_gpu/bin/gmx_mpi
Data prefix:  /sharedapps/ICC/15.2/gromacs/5.1.2_gpu
Command line:
  gmx_mpi mdrun 1600ns_md -dlb auto -v -s 1600ns_md.tpr -npme 1

Back Off! I just backed up md.log to ./#md.log.10#

40 CPUs configured, but only 20 of them are online.
This can happen on embedded platforms (e.g. ARM) where the OS shuts some
cores
off to save power, and will turn them back on later when the load increases.
However, this will likely mean GROMACS cannot pin threads to those cores.
You
will likely see much better performance by forcing all cores to be online,
and
making sure they run at their full clock frequency.

Number of logical cores detected (40) does not match the number reported by
OpenMP (10).
Consider setting the launch configuration manually!

Running on 1 node with total 20 cores, 40 logical cores, 4 compatible GPUs
Hardware detected on host compute-gpu-01 (the node of MPI rank 0):
  CPU info:
    Vendor: GenuineIntel
    Brand:  Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
    SIMD instructions most likely to fit this hardware: AVX2_256
    SIMD instructions selected at GROMACS compile time: AVX2_256
  GPU info:
    Number of GPUs detected: 4
    #0: NVIDIA Tesla K80, compute cap.: 3.7, ECC: yes, stat: compatible
    #1: NVIDIA Tesla K80, compute cap.: 3.7, ECC: yes, stat: compatible
    #2: NVIDIA Tesla K80, compute cap.: 3.7, ECC: yes, stat: compatible
    #3: NVIDIA Tesla K80, compute cap.: 3.7, ECC: yes, stat: compatible

Reading file 1600ns_md.tpr, VERSION 5.1.1 (single precision)
Changing nstlist from 10 to 25, rlist from 1.308 to 1.408

The number of OpenMP threads was set by environment variable
OMP_NUM_THREADS to 4
Using 4 MPI processes
Using 4 OpenMP threads per MPI process

On host compute-gpu-01 4 compatible GPUs are present, with IDs 0,1,2,3
On host compute-gpu-01 3 GPUs auto-selected for this run.
Mapping of GPU IDs to the 3 PP ranks in this node: 0,1,2

NOTE: potentially sub-optimal launch configuration, gmx_mpi started with
less
      PP MPI processes per node than GPUs available.
      Each PP MPI process can use only one GPU, 3 GPUs per node will be
used.

NOTE: GROMACS was configured without NVML support hence it can not exploit
      application clocks of the detected Tesla K80 GPU to improve
performance.
      Recompile with the NVML library (compatible with the driver used) or
set application clocks manually.

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity

Back Off! I just backed up traj_comp.xtc to ./#traj_comp.xtc.10#

Back Off! I just backed up ener.edr to ./#ener.edr.10#

NOTE: DLB will not turn on during the first phase of PME tuning

starting mdrun 'Martini system from folded_ligand_75copies.pdb'
5000000 steps, 100000.0 ps.
step 0
[compute-gpu-01:56224] *** Process received signal ***
[compute-gpu-01:56224] Signal: Segmentation fault (11)
[compute-gpu-01:56224] Signal code: Address not mapped (1)
[compute-gpu-01:56224] Failing at address: 0xfcb649f4
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 56224 on node compute-gpu-01
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I am using following input parameters for the simulations

define           =  -DPOSRES_2 -DPOSRES_lipid
dt               =  0.02
nsteps           =  5000000
nstxout          =  0
nstvout          =  0
nstlog           =  10000
nstxtcout        =  5000
nst
xtc-precision    =  10
rlist            =  1.4
cutoff-scheme    =  verlet
verlet-buffer-drift =  0.005
ns-type     =  grid;
nstlist         =  10;

coulombtype      =  PME
coulomb-modifier =  Potential_shift
rcoulomb         =  1.3
fourierspacing     =  0.1625
pme_order     =  4

epsilon_r        =  15
vdw-type         =  cutoff
vdw-modifier     =  Potential-shift
epsilon_rf       =  0
;rvdw-switch      =  0.9
rvdw             =  1.3
tcoupl           =  v-rescale
tc-grps          =  Protein DPPC_DOPC_POPE_CHOL_PAMS_POPS W_ION
tau-t            =  1.0 1.0 1.0
ref-t            =  323 323 323
Pcoupl           =  parrinello-rahman
Pcoupltype       =  isotropic; semiisotropic ;
tau-p            =  12.0 12.0
compressibility  =  3e-4 3e-4
ref-p            =  1.0 1.0
refcoord_scaling =  all

I am wondering that why it is difficult to run same thing in cluster
computer. Can any one help me to understand these errors?

Many Thanks
Dammar