[gmx-users] Query regarding Domain decomposition
Mark Abraham
mark.j.abraham at gmail.com
Thu Jun 26 16:46:29 CEST 2014
Like I suggested last time, find somewhere mdrun does work, e.g. 1 node.
When you have a complex problem, simplify it and see what you learn ;-) If
mdrun explodes on 1 node, try your .tpr on a local machine, to see whether
it's the mdrun install, or MPI system that could be at fault, or maybe it's
an unstable .tpr...
Mark
On Thu, Jun 26, 2014 at 2:18 PM, suhani nagpal <suhani.nagpal at gmail.com>
wrote:
> Thanks for the insights.
>
> So, now using the .tpr file generated with grompp whose version matches the
> mdrun.
>
> files are generated again but after time step 0 and time 0 nothing gets
> written and the jobs reaches an error state and stops.
>
> in the error file,
>
> mpirun noticed that process rank 41 with PID 35866 on node cn0286. exited
> on signal 11 (Segmentation fault).
>
>
> I'm not able to scale over 160 processors. I have 32600 atoms in the
> system.
>
> Kindly assist
>
> thanks
>
>
> Suhani
>
>
>
>
>
>
> On Wed, Jun 25, 2014 at 5:53 PM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
> > On Jun 25, 2014 8:15 AM, "suhani nagpal" <suhani.nagpal at gmail.com>
> wrote:
> > >
> > > Greetings
> > >
> > > I have been trying to run a few set of simulations using high number of
> > > processors.
> > >
> > > Using the tutorial -
> > >
> >
> >
> http://compchemmpi.wikispaces.com/file/view/Domaindecomposition_KKirchner_27Apr2012.pdf
> > >
> > > I have done calculations to evaluate the set of nodes which would be
> > > optimal for the protein.
> > >
> > >
> > > So the all the files are generated, but error occurs and the trajectory
> > > files remain empty with no error mentioned in the log file.
> >
> > Hard to say. The problem could be anywhere, since we don't yet know when
> > mdrun does work...
> >
> > > Number of nodes to be used in multiple of 16
> > >
> > > box in x and y dimension 8 nm
> > >
> > >
> > >
> > > In the error file,
> > >
> > >
> > > Reading file 400K_SIM2.tpr, VERSION 4.5.5 (single precision)
> >
> > Why use a slow, old version if you want parallel performance?
> >
> > > Note: file tpx version 73, software tpx version 83
> >
> > You should prefer to use grompp whose version matches mdrun.
> >
> > > The number of OpenMP threads was set by environment variable
> > > OMP_NUM_THREADS to 1
> > > Using 320 MPI processes
> > >
> > > NOTE: The load imbalance in PME FFT and solve is 116%.
> > > For optimal PME load balancing
> > > PME grid_x (54) and grid_y (54) should be divisible by
> #PME_nodes_x
> > > (140)
> > > and PME grid_y (54) and grid_z (54) should be divisible by
> > > #PME_nodes_y (1)
> > >
> > >
> > >
> > > mdp file for reference
> > >
> > > ; Bond parameters
> > > continuation = yes ; Restarting after NPT
> > > constraint_algorithm = lincs ; holonomic constraints
> > > constraints = all-bonds ; all bonds (even heavy atom-H bonds)
> > > constrained
> > > lincs_iter = 1 ; accuracy of LINCS
> > > lincs_order = 4 ; also related to accuracy
> > > ; Neighborsearching
> > > ns_type = grid ; search neighboring grid cells
> > > nstlist = 5 ; 10 fs
> > > rlist = 1.0 ; short-range neighborlist cutoff (in
> nm)
> > > rcoulomb = 1.0 ; short-range electrostatic cutoff (in
> > nm)
> > > rvdw = 1.0 ; short-range van der Waals cutoff (in
> > nm)
> > > ; Electrostatics
> > > coulombtype = PME ; Particle Mesh Ewald for long-range
> > > electrostatics
> > > pme_order = 4 ; cubic interpolation
> > > fourierspacing = 0.16 ; grid spacing for FFT
> > > ; Temperature coupling is on
> > > tcoupl = nose-hoover ; nose-hoover coupling
> > > tc-grps = Protein Non-Protein ; two coupling groups - more
> > > accurate
> > > tau_t = 0.2 0.2 ; time constant, in ps
> > > ref_t = 400 400 ; reference temperature, one for each
> > > group, in K
> > > ; Pressure coupling is off
> > > pcoupl = no ;
> > > ; Periodic boundary conditions
> > > pbc = xyz ; 3-D PBC
> > > ; Dispersion correction
> > > DispCorr = EnerPres ; account for cut-off vdW scheme
> > > ; Velocity generation
> > > gen_vel = yes ; assign velocities from Maxwell
> > > distribution
> > > gen_temp = 400 ; temperature for Maxwell distribution
> > > gen_seed = -1 ; generate a random seed
> > >
> > >
> > > Kindly help.
> > >
> > > I have to run simulations at 250 to 300 processors.
> >
> > Maybe. You can't efficiently parallelize an algorithm over arbitrary
> > amounts of hardware. You need 100-1000 atoms per core, depending on
> > hardware, simulation settings and GROMACS version.
> >
> > Mark
> >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list