[gmx-developers] Shall we ditch gro and g96 files?

Mathias PUETZ mpuetz at de.ibm.com
Tue Apr 1 13:26:02 CEST 2008


Hi,

Espresso file format, like most other ascii formats has a serious problem,
if you worry about parallel IO: They are hardly parallelizable.
Even though IO may not be a problem today, given time, it will,
as simulated systems and number of compute nodes get larger
and serial CPU power for ASCII formatting on rank 0 no longer scales.
I would seriously recommend to consider John's suggestion and go for HDF5,
which parallelizes well and offers high flexibility.
For those who want the simplicity of Fortran array IO, I would rather 
spend
a bit of extra effort to develop a comfortable reader tool, that can 
extract ascii
readable data for those who don't want the complexity of having to deal
with HDF5 directly (although HDF5 comes with it's own flexible ascii 
readers
which might be sufficientfor most tasks).

> Message: 5
> Date: Mon, 31 Mar 2008 13:42:41 -0700
> From: "John Chodera" <jchodera at gmail.com>
> Subject: Re: [gmx-developers] Shall we ditch gro and g96 files?
> To: "Discussion list for GROMACS development"
>    <gmx-developers at gromacs.org>
> Message-ID:
>    <14cc10610803311342i7f9ed758r8ed8fe95569573da at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Gentlemen,
> 
> I know I don't chime in very often here, but I wanted to take this
> opportunity to say that I very much support the idea of replacing the
> limited-precision text-based formats like .gro, .pdb, and .g96 with
> more flexible, portable, full-precision file formats.
> 
> Berk's suggestions of Espresso sounds very reasonable, but I would
> encourage you to instead look at netCDF and HDF5:
> 
> netCDF:
> http://www.unidata.ucar.edu/software/netcdf/
> 
> HDF5:
> http://hdf.ncsa.uiuc.edu/HDF5/
> 
> Both of these formats provide easy-to-use libraries with APIs that
> support nearly every language you could want to use (including C,
> Fortran, and Python).  They provide platform-independent, extensible
> formats for storing numerical information.  Both provide attribute
> support, and HDF5 even allows hierarchical organization of objects,
> making it very much like XML but with support for multidimensional
> arrays of the same precision as used internally in gromacs.  The
> libraries are robust, efficient, and well-supported.
> 
> AMBER, for example, has already moved to netCDF for their trajectory
> format, though (unfortunately) not yet for their coordinate/restart
> files.
> 
> http://amber.scripps.edu/netcdf/nctraj.html
> 
> Cheers,
> 
> John
> 
> --
> Dr. John D. Chodera <jchodera at gmail.com>      | Mobile    : 415.867.7384
> Postdoctoral researcher, Pande lab            | Lab phone : 650.723.1097
> Department of Chemistry, Stanford University  | Lab fax   : 650.724.4021
> http://www.dillgroup.ucsf.edu/~jchodera
> 
> On 29/03/2008, David van der Spoel <spoel at xray.bmc.uu.se> wrote:
> > Hi,
> >
> >  as you are aware all coordinate files have their drawbacks.
> >  - gro & pdb have limited space for coordinates which is problematic 
for
> >  simulating large systems
> >  - pdb has no velocities
> >  - gro & g96 can not store information on the element (i.e. can not
> >  distinguish between Calpha and Calcium or Hgamma and Mercury, pdb can 
do
> >  this)
> >  - gro stores non-rectanular boxes in an awkward manner
> >
> >  I would therefore propose to make better coordinate file format that 
has
> >  - coordinates
> >  - velocities
> >  - box as three edges and three angles (as in pdb file)
> >  - atom name (and number)
> >  - residue name and number
> >  - element type (we could also introduce special elements for united
> >  atoms or course grained particles, but they should not overlap with 
real
> >  elements)
> >  - variable format (no fixed column widths)
> >
> >  In order to encourage the use of such a more flexible file format I
> >  would then propose that we remove the facility for writing gro and 
g96
> >  files.
> >
> >  Please let me know what you think.
> >
> >  --
> >  David van der Spoel, Ph.D.
> >  Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala 
University.
> >  Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205. Fax: 
+4618511755.
> >  spoel at xray.bmc.uu.se    spoel at gromacs.org   http://folding.bmc.uu.se
> >  _______________________________________________
> >  gmx-developers mailing list
> >  gmx-developers at gromacs.org
> >  http://www.gromacs.org/mailman/listinfo/gmx-developers
> >  Please don't post (un)subscribe requests to the list. Use the
> >  www interface or send it to gmx-developers-request at gromacs.org.
> >
> 
> 
> ------------------------------
> 
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> 
> 
> End of gmx-developers Digest, Vol 48, Issue 1
> *********************************************


Viele Grüsse / Best regards,
Dr. Mathias Pütz

IT Specialist for Application Performance

Deep Computing - Strategic Growth Business
IBM Systems & Technology Group

e-mail:  mpuetz at de.ibm.com
mobile: + 49-(0)160-7120602
fax:         + 49-(0)6131-84-6660

Anschrift:
  IBM Deutschland GmbH
  Department B513
  Hechtsheimer Str. 2 / Building 12
  55131 Mainz
  Germany

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Hans Ulrich Maerki
Geschäftsführung: Martin Jetter (Vorsitzender), Christian Diedrich, 
Christoph Grandpierre, Matthias Hartmann, Thomas Fell, Michael Diemer
Sitz der Gesellschaft: Stuttgart
Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 
99369940

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20080401/6a29f510/attachment.html>


More information about the gromacs.org_gmx-developers mailing list