[gmx-users] Query regarding Domain decomposition

Thu Jun 26 16:46:29 CEST 2014

Like I suggested last time, find somewhere mdrun does work, e.g. 1 node.
When you have a complex problem, simplify it and see what you learn ;-) If
mdrun explodes on 1 node, try your .tpr on a local machine, to see whether
it's the mdrun install, or MPI system that could be at fault, or maybe it's
an unstable .tpr...

Mark

On Thu, Jun 26, 2014 at 2:18 PM, suhani nagpal <suhani.nagpal at gmail.com>
wrote:

> Thanks for the insights.
>
> So, now using the .tpr file generated with grompp whose version matches the
> mdrun.
>
> files are generated again but after time step 0 and time 0 nothing gets
> written and the jobs reaches an error state and stops.
>
> in the error file,
>
> mpirun noticed that process rank 41 with PID 35866 on node cn0286. exited
> on signal 11 (Segmentation fault).
>
>
> I'm not able to scale over 160 processors. I have 32600 atoms in the
> system.
>
> Kindly assist
>
> thanks
>
>
> Suhani
>
>
>
>
>
>
> On Wed, Jun 25, 2014 at 5:53 PM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
> > On Jun 25, 2014 8:15 AM, "suhani nagpal" <suhani.nagpal at gmail.com>
> wrote:
> > >
> > > Greetings
> > >
> > > I have been trying to run a few set of simulations using high number of
> > > processors.
> > >
> > > Using the tutorial -
> > >
> >
> >
> http://compchemmpi.wikispaces.com/file/view/Domaindecomposition_KKirchner_27Apr2012.pdf
> > >
> > > I have done calculations to evaluate the set of nodes which would be
> > > optimal for the protein.
> > >
> > >
> > > So the all the files are generated, but error occurs and the trajectory
> > > files remain empty with no error mentioned in the log file.
> >
> > Hard to say. The problem could be anywhere, since we don't yet know when
> > mdrun does work...
> >
> > > Number of nodes to be used in multiple of 16
> > >
> > > box in x and y dimension 8 nm
> > >
> > >
> > >
> > > In the error file,
> > >
> > >
> > > Reading file 400K_SIM2.tpr, VERSION 4.5.5 (single precision)
> >
> > Why use a slow, old version if you want parallel performance?
> >
> > > Note: file tpx version 73, software tpx version 83
> >
> > You should prefer to use grompp whose version matches mdrun.
> >
> > > The number of OpenMP threads was set by environment variable
> > > OMP_NUM_THREADS to 1
> > > Using 320 MPI processes
> > >
> > > NOTE: The load imbalance in PME FFT and solve is 116%.
> > >       For optimal PME load balancing
> > >       PME grid_x (54) and grid_y (54) should be divisible by
> #PME_nodes_x
> > > (140)
> > >       and PME grid_y (54) and grid_z (54) should be divisible by
> > > #PME_nodes_y (1)
> > >
> > >
> > >
> > > mdp file for reference
> > >
> > > ; Bond parameters
> > > continuation    = yes           ; Restarting after NPT
> > > constraint_algorithm = lincs    ; holonomic constraints
> > > constraints     = all-bonds     ; all bonds (even heavy atom-H bonds)
> > > constrained
> > > lincs_iter      = 1             ; accuracy of LINCS
> > > lincs_order     = 4             ; also related to accuracy
> > > ; Neighborsearching
> > > ns_type         = grid          ; search neighboring grid cells
> > > nstlist         = 5             ; 10 fs
> > > rlist           = 1.0           ; short-range neighborlist cutoff (in
> nm)
> > > rcoulomb        = 1.0           ; short-range electrostatic cutoff (in
> > nm)
> > > rvdw            = 1.0           ; short-range van der Waals cutoff (in
> > nm)
> > > ; Electrostatics
> > > coulombtype     = PME           ; Particle Mesh Ewald for long-range
> > > electrostatics
> > > pme_order       = 4             ; cubic interpolation
> > > fourierspacing  = 0.16          ; grid spacing for FFT
> > > ; Temperature coupling is on
> > > tcoupl          = nose-hoover   ; nose-hoover coupling
> > > tc-grps         = Protein Non-Protein   ; two coupling groups - more
> > > accurate
> > > tau_t           = 0.2   0.2     ; time constant, in ps
> > > ref_t           = 400   400     ; reference temperature, one for each
> > > group, in K
> > > ; Pressure coupling is off
> > > pcoupl          = no            ;
> > > ; Periodic boundary conditions
> > > pbc             = xyz           ; 3-D PBC
> > > ; Dispersion correction
> > > DispCorr        = EnerPres      ; account for cut-off vdW scheme
> > > ; Velocity generation
> > > gen_vel         = yes           ; assign velocities from Maxwell
> > > distribution
> > > gen_temp        = 400           ; temperature for Maxwell distribution
> > > gen_seed        = -1            ; generate a random seed
> > >
> > >
> > > Kindly help.
> > >
> > > I have to run simulations at 250 to 300 processors.
> >
> > Maybe. You can't efficiently parallelize an algorithm over arbitrary
> > amounts of hardware. You need 100-1000 atoms per core, depending on
> > hardware, simulation settings and GROMACS version.
> >
> > Mark
> >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>