[gmx-developers] Shall we ditch gro and g96 files?

Roland Schulz schulzr at ornl.gov
Tue Apr 1 15:25:33 CEST 2008


Hi,

I agree that parallel IO is very important. Not so much because of the
CPU time on node 0 (yet) but because of the memory requirement on node
0 for a few million atoms we have this problem already with NAMD.

For NetCDF a parallel version exists too:
http://trac.mcs.anl.gov/projects/parallel-netcdf

Should we ask some other MD software groups whether they are also
planning to introduce a new coordinate file format so we may come up
with something compatible?

regards
Roland

On Tue, Apr 1, 2008 at 8:34 AM, Erik Lindahl <lindahl at cbr.su.se> wrote:
> Hi,
>
> On Apr 1, 2008, at 1:26 PM, Mathias PUETZ wrote:
>
> Espresso file format, like most other ascii formats has a serious problem,
> if you worry about parallel IO: They are hardly parallelizable.
> Even though IO may not be a problem today, given time, it will,
> as simulated systems and number of compute nodes get larger
> and serial CPU power for ASCII formatting on rank 0 no longer scales.
> I would seriously recommend to consider John's suggestion and go for HDF5,
> which parallelizes well and offers high flexibility.
> For those who want the simplicity of Fortran array IO, I would rather spend
> a bit of extra effort to develop a comfortable reader tool, that can extract
> ascii
> readable data for those who don't want the complexity of having to deal
> with HDF5 directly (although HDF5 comes with it's own flexible ascii readers
> which might be sufficientfor most tasks).
>
> We actually considered NetCDF a long time ago, but at that time we decided
> against it since HDF5 was coming, but was too new/unstable then :-)
>
> I think a lot of people (including me...) like to be able to do "simple"
> coordinate manipulation through scripts that just grep/awk for atom names,
> but I like Mathias suggestion of having a separate tool to translate
> back/forth instead, and keep the "core" format HDF5.
>
> The only thing that worries me (just a little bit :-) is that it would make
> us entirely dependent on a big external library. I know that HDF5 is _very_
> portable, but at least in theory we could end up in a situation where
> Gromacs doesn't work on some obscure platform e.g. because there's a
> compiler bug affecting HDF5.
>
> Of course, that might be a reasonable compromise, but since I ended up doing
> my own implementation of the Unix external data representation (XDR) when we
> first ported Gromacs to windows I've toyed around with the idea of having
> some minimal built-in HDF5-generating code as a backup...
>
> Mathias/John, do you or anybody else have any experience from using HDF5 for
> development? Have there been different library versions that you need to
> install, or do packages usually include their own copy of the library?
>
> Cheers,
>
> Erik
>
>
>
>
>
> > Message: 5
>  > Date: Mon, 31 Mar 2008 13:42:41 -0700
>  > From: "John Chodera" <jchodera at gmail.com>
>  > Subject: Re: [gmx-developers] Shall we ditch gro and g96 files?
>  > To: "Discussion list for GROMACS development"
>  >    <gmx-developers at gromacs.org>
>  > Message-ID:
>  >    <14cc10610803311342i7f9ed758r8ed8fe95569573da at mail.gmail.com>
>  > Content-Type: text/plain; charset=ISO-8859-1
>  >
>  > Gentlemen,
>  >
>  > I know I don't chime in very often here, but I wanted to take this
>  > opportunity to say that I very much support the idea of replacing the
>  > limited-precision text-based formats like .gro, .pdb, and .g96 with
>  > more flexible, portable, full-precision file formats.
>  >
>  > Berk's suggestions of Espresso sounds very reasonable, but I would
>  > encourage you to instead look at netCDF and HDF5:
>  >
>  > netCDF:
>  > http://www.unidata.ucar.edu/software/netcdf/
>  >
>  > HDF5:
>  > http://hdf.ncsa.uiuc.edu/HDF5/
>  >
>  > Both of these formats provide easy-to-use libraries with APIs that
>  > support nearly every language you could want to use (including C,
>  > Fortran, and Python).  They provide platform-independent, extensible
>  > formats for storing numerical information.  Both provide attribute
>  > support, and HDF5 even allows hierarchical organization of objects,
>  > making it very much like XML but with support for multidimensional
>  > arrays of the same precision as used internally in gromacs.  The
>  > libraries are robust, efficient, and well-supported.
>  >
>  > AMBER, for example, has already moved to netCDF for their trajectory
>  > format, though (unfortunately) not yet for their coordinate/restart
>  > files.
>  >
>  > http://amber.scripps.edu/netcdf/nctraj.html
>  >
>  > Cheers,
>  >
>  > John
>  >
>  > --
>  > Dr. John D. Chodera <jchodera at gmail.com>      | Mobile    : 415.867.7384
>  > Postdoctoral researcher, Pande lab            | Lab phone : 650.723.1097
>  > Department of Chemistry, Stanford University  | Lab fax   : 650.724.4021
>  > http://www.dillgroup.ucsf.edu/~jchodera
>  >
>  > On 29/03/2008, David van der Spoel <spoel at xray.bmc.uu.se> wrote:
>  > > Hi,
>  > >
>  > >  as you are aware all coordinate files have their drawbacks.
>  > >  - gro & pdb have limited space for coordinates which is problematic
> for
>  > >  simulating large systems
>  > >  - pdb has no velocities
>  > >  - gro & g96 can not store information on the element (i.e. can not
>  > >  distinguish between Calpha and Calcium or Hgamma and Mercury, pdb can
> do
>  > >  this)
>  > >  - gro stores non-rectanular boxes in an awkward manner
>  > >
>  > >  I would therefore propose to make better coordinate file format that
> has
>  > >  - coordinates
>  > >  - velocities
>  > >  - box as three edges and three angles (as in pdb file)
>  > >  - atom name (and number)
>  > >  - residue name and number
>  > >  - element type (we could also introduce special elements for united
>  > >  atoms or course grained particles, but they should not overlap with
> real
>  > >  elements)
>  > >  - variable format (no fixed column widths)
>  > >
>  > >  In order to encourage the use of such a more flexible file format I
>  > >  would then propose that we remove the facility for writing gro and g96
>  > >  files.
>  > >
>  > >  Please let me know what you think.
>  > >
>  > >  --
>  > >  David van der Spoel, Ph.D.
>  > >  Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala
> University.
>  > >  Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205. Fax:
> +4618511755.
>  > >  spoel at xray.bmc.uu.se    spoel at gromacs.org   http://folding.bmc.uu.se
>  > >  _______________________________________________
>  > >  gmx-developers mailing list
>  > >  gmx-developers at gromacs.org
>  > >  http://www.gromacs.org/mailman/listinfo/gmx-developers
>  > >  Please don't post (un)subscribe requests to the list. Use the
>  > >  www interface or send it to gmx-developers-request at gromacs.org.
>  > >
>  >
>  >
>  > ------------------------------
>  >
>  > _______________________________________________
>  > gmx-developers mailing list
>  > gmx-developers at gromacs.org
>  > http://www.gromacs.org/mailman/listinfo/gmx-developers
>  >
>  >
>  > End of gmx-developers Digest, Vol 48, Issue 1
>  > *********************************************
>
>
>  Viele Grüsse / Best regards,
>  Dr. Mathias Pütz
>
>  IT Specialist for Application Performance
>
>  Deep Computing - Strategic Growth Business
>  IBM Systems & Technology Group
>
>  e-mail:  mpuetz at de.ibm.com
>  mobile: + 49-(0)160-7120602
>  fax:         + 49-(0)6131-84-6660
>
>  Anschrift:
>   IBM Deutschland GmbH
>   Department B513
>   Hechtsheimer Str. 2 / Building 12
>   55131 Mainz
>   Germany
>
>  IBM Deutschland GmbH
>  Vorsitzender des Aufsichtsrats: Hans Ulrich Maerki
>  Geschäftsführung: Martin Jetter (Vorsitzender), Christian Diedrich,
> Christoph Grandpierre, Matthias Hartmann, Thomas Fell, Michael Diemer
>  Sitz der Gesellschaft: Stuttgart
>  Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940
>
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
> ------------
> Erik Lindahl   <lindahl at cbr.su.se>  Backup: <erik.lindahl at gmail.com>
> Assistant Professor, Computational Structural Biology
> Center for Biomembrane Research, Dept. Biochemistry & Biophysics
> Stockholm University, SE-106 91 Stockholm, Sweden
> Tel: +46(0)8164675  Mobile: +46(0)704218767  Fax: mail a PDF instead
>
>
>
>
>
> _______________________________________________
>  gmx-developers mailing list
>  gmx-developers at gromacs.org
>  http://www.gromacs.org/mailman/listinfo/gmx-developers
>  Please don't post (un)subscribe requests to the list. Use the
>  www interface or send it to gmx-developers-request at gromacs.org.
>
>



More information about the gromacs.org_gmx-developers mailing list