[gmx-users] Seeking advice on how to build Gromacs on Teragrid	resources
    Roland Schulz 
    roland at utk.edu
       
    Fri Dec 10 06:59:24 CET 2010
    
    
  
What MVAPICH version are you using?
Are you using a TPR file you know is running fine on some other machine?
Does the 4.5.2 version they installed run correct? If so what is the
configure line they used?
Roland
On Thu, Dec 9, 2010 at 5:14 PM, J. Nathan Scott <
scottjn at chemistry.montana.edu> wrote:
> Hello gmx users! I realize this may be a touch off topic, but I am
> hoping that someone out there can offer some advice on how to build
> Gromacs for parallel use on a Teragrid site. Our group is currently
> using Abe on Teragrid, and unfortunately the latest version of Gromacs
> compiled for public use on Abe is 4.0.2. Apparently installation of
> 4.5.3 is at least on the to-do list for Abe, but we would very much
> like to use 4.5.3 now if we can get this issue figured it out.
>
> I have built a parallel version of mdrun using Abe installed versions
> of fftw3 and mvapich2 using the following commands:
>
> setenv CPPFLAGS "-I/usr/apps/math/fftw/fftw-3.1.2/gcc/include/
> -I/usr/apps/mpi/marmot_mvapich2_intel/include"
> setenv LDFLAGS "-L/usr/apps/math/fftw/fftw-3.1.2/gcc/lib
> -L/usr/apps/mpi/marmot_mvapich2_intel/lib"
> ./configure --enable-mpi --enable-float --prefix=/u/ac/jnscott/gromacs
> --program-suffix=_mpi
> make -j 8 mdrun && make install-mdrun
>
> My PBS script file looks like the following:
>
> -------------------------------
> #!/bin/csh
> #PBS -l nodes=2:ppn=8
> #PBS -V
> #PBS -o pbs_nvt.out
> #PBS -e pbs_nvt.err
> #PBS -l walltime=2:00:00
> #PBS -N gmx
> cd /u/ac/jnscott/1stn/1stn_wt/oplsaa_spce
> mvapich2-start-mpd
> setenv NP `wc -l ${PBS_NODEFILE} | cut -d'/' -f1`
> setenv MV2_SRQ_SIZE 4000
> mpirun -np ${NP} mdrun_mpi -s nvt.tpr -o nvt.trr -x nvt.xtc -cpo
> nvt.cpt -c nvt.gro -e nvt.edr -g nvt.log -dlb yes
> -------------------------------
>
> Unfortunately my runs always fail in the same manner. The log file
> simply ends, as you can see below. It appears that Gromacs is picking
> up the correct number of nodes specified in the PBS script, but then
> something causes it to quit abruptly with no error message.
>
> -------------------------------
> <snip>
> Initializing Domain Decomposition on 16 nodes
> Dynamic load balancing: yes
> Will sort the charge groups at every domain (re)decomposition
> Initial maximum inter charge-group distances:
>    two-body bonded interactions: 0.526 nm, LJ-14, atoms 1735 1744
>  multi-body bonded interactions: 0.526 nm, Ryckaert-Bell., atoms 1735 1744
> Minimum cell size due to bonded interactions: 0.578 nm
> Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.820 nm
> Estimated maximum distance required for P-LINCS: 0.820 nm
> This distance will limit the DD cell size, you can override this with -rcon
> Guess for relative PME load: 0.27
> Will use 10 particle-particle and 6 PME only nodes
> This is a guess, check the performance at the end of the log file
> Using 6 separate PME nodes
> Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
> Optimizing the DD grid for 10 cells with a minimum initial size of 1.025 nm
> The maximum allowed number of cells is: X 5 Y 5 Z 4
> Domain decomposition grid 2 x 5 x 1, separate PME nodes 6
> PME domain decomposition: 2 x 3 x 1
> Interleaving PP and PME nodes
> This is a particle-particle only node
>
> Domain decomposition nodeid 0, coordinates 0 0 0
>
> Using two step summing over 2 groups of on average 5.0 processes
>
> Table routines are used for coulomb: TRUE
> Table routines are used for vdw:     FALSE
> Will do PME sum in reciprocal space.
>
> <snip>
>
> Will do ordinary reciprocal space Ewald sum.
> Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
> Cut-off's:   NS: 1   Coulomb: 1   LJ: 1
> Long Range LJ corr.: <C6> 3.3589e-04
> System total charge: 0.000
> Generated table with 1000 data points for Ewald.
> Tabscale = 500 points/nm
> Generated table with 1000 data points for LJ6.
> Tabscale = 500 points/nm
> Generated table with 1000 data points for LJ12.
> Tabscale = 500 points/nm
> Generated table with 1000 data points for 1-4 COUL.
> Tabscale = 500 points/nm
> Generated table with 1000 data points for 1-4 LJ6.
> Tabscale = 500 points/nm
> Generated table with 1000 data points for 1-4 LJ12.
> Tabscale = 500 points/nm
>
> Enabling SPC-like water optimization for 6952 molecules.
>
> Configuring nonbonded kernels...
> Configuring standard C nonbonded kernels...
> Testing x86_64 SSE2 support... present.
>
> Removing pbc first time
>
> Initializing Parallel LINear Constraint Solver
>
> <snip>
> Linking all bonded interactions to atoms
> There are 9778 inter charge-group exclusions,
> will use an extra communication step for exclusion forces for PME
>
> The maximum number of communication pulses is: X 1 Y 2
> The minimum size for domain decomposition cells is 0.827 nm
> The requested allowed shrink of DD cells (option -dds) is: 0.80
> The allowed shrink of domain decomposition cells is: X 0.35 Y 0.73
> The maximum allowed distance for charge groups involved in interactions is:
>                 non-bonded interactions           1.000 nm
>            two-body bonded interactions  (-rdd)   1.000 nm
>          multi-body bonded interactions  (-rdd)   0.827 nm
>  atoms separated by up to 5 constraints  (-rcon)  0.827 nm
>
>
> Making 2D domain decomposition grid 2 x 5 x 1, home cell index 0 0 0
>
> Center of mass motion removal mode is Linear
> We have the following groups for center of mass motion removal:
>  0:  rest
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> G. Bussi, D. Donadio and M. Parrinello
> Canonical sampling through velocity rescaling
> J. Chem. Phys. 126 (2007) pp. 014101
> -------- -------- --- Thank You --- -------- --------
> -----------------------------------------------------------
>
> My PBS error file is not of much help either I fear, an example of
> such a file is pasted below:
>
> -----------------------------------
> stty: standard input: Invalid argument
> stty: standard input: Invalid argument
> NNODES=16, MYRANK=0, HOSTNAME=abe0828
> NNODES=16, MYRANK=2, HOSTNAME=abe0828
> NNODES=16, MYRANK=12, HOSTNAME=abe0828
> NNODES=16, MYRANK=4, HOSTNAME=abe0828
> NNODES=16, MYRANK=10, HOSTNAME=abe0828
> NNODES=16, MYRANK=8, HOSTNAME=abe0828
> NNODES=16, MYRANK=6, HOSTNAME=abe0828
> NNODES=16, MYRANK=14, HOSTNAME=abe0828
> NODEID=0 argc=17
> NODEID=2 argc=17
> NODEID=4 argc=17
> NODEID=10 argc=17
> NODEID=12 argc=17
> NODEID=6 argc=17
> NODEID=14 argc=17
> NODEID=8 argc=17
> NNODES=16, MYRANK=5, HOSTNAME=abe0825
> NNODES=16, MYRANK=13, HOSTNAME=abe0825
>                         :-)  G  R  O  M  A  C  S  (-:
>
> NNODES=16, MYRANK=9, HOSTNAME=abe0825
> NNODES=16, MYRANK=11, HOSTNAME=abe0825
>                   Great Red Oystrich Makes All Chemists Sane
>
>                            :-)  VERSION 4.5.3  (-:
>
> <snip>
> Back Off! I just backed up nvt.log to ./#nvt.log.2#
> Reading file nvt.tpr, VERSION 4.5.3 (single precision)
>
> Will use 10 particle-particle and 6 PME only nodes
> This is a guess, check the performance at the end of the log file
> Making 2D domain decomposition 2 x 5 x 1
>
> Back Off! I just backed up nvt.edr to ./#nvt.edr.2#
> ----------------------------------------------
>
> The non-Torque section of the PBS log file is below:
>
> -----------------------------------------------
> Warning: no access to tty (Bad file descriptor).
> Thus no job control in this shell.
> running mpdallexit on abe0828
> LAUNCHED mpd on abe0828  via
> RUNNING: mpd on abe0828
> LAUNCHED mpd on abe0825  via  abe0828
> RUNNING: mpd on abe0825
> abe0828_43972 (10.1.67.66)
> abe0825_37571 (10.1.67.63)
> rank 1 in job 1  abe0828_43972   caused collective abort of all ranks
>  exit status of rank 1: killed by signal 9
> rank 0 in job 1  abe0828_43972   caused collective abort of all ranks
>  exit status of rank 0: killed by signal 9
> -------------------------------------------------
>
> I would should also note that both .edr and .trr files are created in
> the working directory, but both files are 0 bytes.
>
> Like I said, I realize this question is perhaps a bit off the topic of
> Gromacs exclusively, but I hope that someone can offer some tips or
> spot any obvious problems with my method that I have not noticed and
> would sincerely appreciate any help you can offer a novice.
>
> Best Wishes,
> -Nathan
>
>
> ----------
> J. Nathan Scott, Ph.D.
> Postdoctoral Fellow
> Department of Chemistry and Biochemistry
> Montana State University
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>
>
-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20101210/4d2661c5/attachment.html>
    
    
More information about the gromacs.org_gmx-users
mailing list