[gmx-users] Restarting crashed simulation

Ali Ahmed aa5635737 at gmail.com
Fri Nov 17 19:37:06 CET 2017


Hello GROMACS users
My MD simulation was crashed then I restarted the simulation from the point
when the point was written using this command on 64 processors: mpirun -np
64  mdrun_mpi -s md.tpr -cpi stat.cpt

After few days I got nothing in the folder usch as output.gro and I got the
following
_______________________________________________
Command line:
  mdrun_mpi -s md.tpr -cpi stat.cpt

Warning: No checkpoint file found with -cpi option. Assuming this is a new
run.


Back Off! I just backed up md.log to ./#md.log.2#

Running on 4 nodes with total 64 cores, 64 logical cores
  Cores per node:           16
  Logical cores per node:   16
Hardware detected on host compute-2-27.local (the node of MPI rank 0):
  CPU info:
    Vendor: Intel
    Brand:  Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
    SIMD instructions most likely to fit this hardware: AVX_256
    SIMD instructions selected at GROMACS compile time: AVX_256

  Hardware topology: Basic

Reading file md.tpr, VERSION 2016.3 (single precision)
Changing nstlist from 10 to 40, rlist from 1 to 1.003

Will use 48 particle-particle and 16 PME only ranks
This is a guess, check the performance at the end of the log file
Using 64 MPI processes
Using 1 OpenMP thread per MPI process

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity
WARNING: This run will generate roughly 50657 Mb of data

starting mdrun 'Molecular Dynamics'
25000000 steps,  50000.0 ps.

step 888000 Turning on dynamic load balancing, because the performance loss
due to load imbalance is 8.7 %.
step 930400 Turning off dynamic load balancing, because it is degrading
performance.
step 1328000 Turning on dynamic load balancing, because the performance
loss due to load imbalance is 3.4 %.
step 1328800 Turning off dynamic load balancing, because it is degrading
performance.
step 1336000 Turning on dynamic load balancing, because the performance
loss due to load imbalance is 3.4 %.
step 1338400 Turning off dynamic load balancing, because it is degrading
performance.
step 1340000 Will no longer try dynamic load balancing, as it degraded
performance.
Writing final coordinates.
 Average load imbalance: 13.2 %
 Part of the total run time spent waiting due to load imbalance: 7.5 %
 Average PME mesh/force load: 1.077
 Part of the total run time spent waiting due to PP/PME imbalance: 4.1 %

NOTE: 7.5 % of the available CPU time was lost due to load imbalance
      in the domain decomposition.
      You might want to use dynamic load balancing (option -dlb.)


               Core t (s)   Wall t (s)        (%)
       Time: 26331875.601   411435.556     6400.0
                         4d18h17:15
                 (ns/day)    (hour/ns)
Performance:       10.500        2.286
_____________________________________________________________

Any advise or suggestion will be helpful.

Thanks in advance


More information about the gromacs.org_gmx-users mailing list