[gmx-users] FEP calculations on multiple nodes

Vikas Dubey vikasdubey055 at gmail.com
Mon Aug 21 16:41:11 CEST 2017


Hi Micheal,





** What does the logfile say that was output?*
*Ans : Log file output while running on PC (with command gmx mdrun -deffnm
md_0 -nt 36 ).  :*

Using GPU 8x8 non-bonded kernels

Removing pbc first time
Pinning threads with an auto-selected logical core stride of 1

Initializing Parallel LINear Constraint Solver

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess
P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 116-122
-------- -------- --- Thank You --- -------- --------

The number of constraints is 36872
There are inter charge-group constraints,
will communicate selected coordinates each lincs iteration
9303 constraints are involved in constraint triangles,
will apply an additional matrix expansion of order 6 for couplings
between constraints inside triangles

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------


Linking all bonded interactions to atoms
There are 45357 inter charge-group virtual sites,
will an extra communication step for selected coordinates and forces

The initial number of communication pulses is: X 1 Y 1 Z 1
The initial domain decomposition cell size is: X 6.03 nm Y 3.02 nm Z 9.20 nm

The maximum allowed distance for charge groups involved in interactions is:
                 non-bonded interactions           1.261 nm
(the following are initial values, they could change due to box deformation)
            two-body bonded interactions  (-rdd)   1.261 nm
          multi-body bonded interactions  (-rdd)   1.261 nm
              virtual site constructions  (-rcon)  3.016 nm
  atoms separated by up to 7 constraints  (-rcon)  3.016 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 1 Y 1 Z 1
The minimum size for domain decomposition cells is 1.261 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.21 Y 0.42 Z 0.14
The maximum allowed distance for charge groups involved in interactions is:
                 non-bonded interactions           1.261 nm
            two-body bonded interactions  (-rdd)   1.261 nm
          multi-body bonded interactions  (-rdd)   1.261 nm
              virtual site constructions  (-rcon)  1.261 nm
  atoms separated by up to 7 constraints  (-rcon)  1.261 nm


Making 3D domain decomposition grid 2 x 4 x 2, home cell index 0 0 0

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  System

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



** What command are you using to run on multiple nodes ? *

*I use following script on the cluster. Last line indicates the command. *


#SBATCH --job-name=2_1_0
#SBATCH --mail-type=ALL
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks-per-core=2
#SBATCH --cpus-per-task=4
#SBATCH --constraint=gpu
#SBATCH --output out.txt
#SBATCH --error  err.txt
#========================================
# load modules and run simulation
module load daint-gpu
module load GROMACS
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export CRAY_CUDA_MPS=1

srun -n $SLURM_NTASKS --ntasks-per-node=$SLURM_NTASKS_PER_NODE -c
$SLURM_CPUS_PER_TASK gmx_mpi mdrun -deffnm md_0


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


** What is the .mdp file?*


My general *.mdp file is similar to what has been described here, apart
from certain changes for protein-membrane system:


http://wwwuser.gwdg.de/~ggroenh/exercise_html/exercise1.html

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


** How many nodes are you running on?*

Simulation runs fine on one node with 24 cores.  I want to run each windows
on maybe 2-3 nodes.  I have tried running simulation on my desktop using
"-nt" flag. It works fine until -nt 30. After that simulation crashes.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

** What version of the program?*

GROMACS 5.1.4

*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*



*Thanks,*

*Vikas*





On 21 August 2017 at 15:28, Michael Shirts <mrshirts at gmail.com> wrote:

> Significantly more information is be needed to understand what happened.
>
> * What does the logfile say that was output?
> * What command are you using to run on multiple nodes?
> * What is the .mdp file?
> * How many nodes are you running on?
> * What version of the program?
>
> And so forth.
>
> On Mon, Aug 21, 2017 at 4:49 AM, Vikas Dubey <vikasdubey055 at gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > I am trying runa  FEP calculation with a system of ~250000 particles. I
> > have 20 windows and I am currently running my simulations on 1 node each.
> > Since, my system is big, I just get 2.5ns in day. So, I thought to run
> each
> > of my window on multiple nodes but for some reason, it crashes
> immediately
> > after starting with an error.
> >
> >
> > *Segmentation fault (core dumped)*
> >
> > Simulations run smoothly on one node. No error there. I tried to see file
> > but there was nothing written there. Any help would be very much
> > appreciated.
> >
> >
> > Thanks,
> > Vikas
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/
> > Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list