[gmx-developers] Shall we ditch gro and g96 files?
Roland Schulz
schulzr at ornl.gov
Tue Apr 1 15:25:33 CEST 2008
Hi,
I agree that parallel IO is very important. Not so much because of the
CPU time on node 0 (yet) but because of the memory requirement on node
0 for a few million atoms we have this problem already with NAMD.
For NetCDF a parallel version exists too:
http://trac.mcs.anl.gov/projects/parallel-netcdf
Should we ask some other MD software groups whether they are also
planning to introduce a new coordinate file format so we may come up
with something compatible?
regards
Roland
On Tue, Apr 1, 2008 at 8:34 AM, Erik Lindahl <lindahl at cbr.su.se> wrote:
> Hi,
>
> On Apr 1, 2008, at 1:26 PM, Mathias PUETZ wrote:
>
> Espresso file format, like most other ascii formats has a serious problem,
> if you worry about parallel IO: They are hardly parallelizable.
> Even though IO may not be a problem today, given time, it will,
> as simulated systems and number of compute nodes get larger
> and serial CPU power for ASCII formatting on rank 0 no longer scales.
> I would seriously recommend to consider John's suggestion and go for HDF5,
> which parallelizes well and offers high flexibility.
> For those who want the simplicity of Fortran array IO, I would rather spend
> a bit of extra effort to develop a comfortable reader tool, that can extract
> ascii
> readable data for those who don't want the complexity of having to deal
> with HDF5 directly (although HDF5 comes with it's own flexible ascii readers
> which might be sufficientfor most tasks).
>
> We actually considered NetCDF a long time ago, but at that time we decided
> against it since HDF5 was coming, but was too new/unstable then :-)
>
> I think a lot of people (including me...) like to be able to do "simple"
> coordinate manipulation through scripts that just grep/awk for atom names,
> but I like Mathias suggestion of having a separate tool to translate
> back/forth instead, and keep the "core" format HDF5.
>
> The only thing that worries me (just a little bit :-) is that it would make
> us entirely dependent on a big external library. I know that HDF5 is _very_
> portable, but at least in theory we could end up in a situation where
> Gromacs doesn't work on some obscure platform e.g. because there's a
> compiler bug affecting HDF5.
>
> Of course, that might be a reasonable compromise, but since I ended up doing
> my own implementation of the Unix external data representation (XDR) when we
> first ported Gromacs to windows I've toyed around with the idea of having
> some minimal built-in HDF5-generating code as a backup...
>
> Mathias/John, do you or anybody else have any experience from using HDF5 for
> development? Have there been different library versions that you need to
> install, or do packages usually include their own copy of the library?
>
> Cheers,
>
> Erik
>
>
>
>
>
> > Message: 5
> > Date: Mon, 31 Mar 2008 13:42:41 -0700
> > From: "John Chodera" <jchodera at gmail.com>
> > Subject: Re: [gmx-developers] Shall we ditch gro and g96 files?
> > To: "Discussion list for GROMACS development"
> > <gmx-developers at gromacs.org>
> > Message-ID:
> > <14cc10610803311342i7f9ed758r8ed8fe95569573da at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Gentlemen,
> >
> > I know I don't chime in very often here, but I wanted to take this
> > opportunity to say that I very much support the idea of replacing the
> > limited-precision text-based formats like .gro, .pdb, and .g96 with
> > more flexible, portable, full-precision file formats.
> >
> > Berk's suggestions of Espresso sounds very reasonable, but I would
> > encourage you to instead look at netCDF and HDF5:
> >
> > netCDF:
> > http://www.unidata.ucar.edu/software/netcdf/
> >
> > HDF5:
> > http://hdf.ncsa.uiuc.edu/HDF5/
> >
> > Both of these formats provide easy-to-use libraries with APIs that
> > support nearly every language you could want to use (including C,
> > Fortran, and Python). They provide platform-independent, extensible
> > formats for storing numerical information. Both provide attribute
> > support, and HDF5 even allows hierarchical organization of objects,
> > making it very much like XML but with support for multidimensional
> > arrays of the same precision as used internally in gromacs. The
> > libraries are robust, efficient, and well-supported.
> >
> > AMBER, for example, has already moved to netCDF for their trajectory
> > format, though (unfortunately) not yet for their coordinate/restart
> > files.
> >
> > http://amber.scripps.edu/netcdf/nctraj.html
> >
> > Cheers,
> >
> > John
> >
> > --
> > Dr. John D. Chodera <jchodera at gmail.com> | Mobile : 415.867.7384
> > Postdoctoral researcher, Pande lab | Lab phone : 650.723.1097
> > Department of Chemistry, Stanford University | Lab fax : 650.724.4021
> > http://www.dillgroup.ucsf.edu/~jchodera
> >
> > On 29/03/2008, David van der Spoel <spoel at xray.bmc.uu.se> wrote:
> > > Hi,
> > >
> > > as you are aware all coordinate files have their drawbacks.
> > > - gro & pdb have limited space for coordinates which is problematic
> for
> > > simulating large systems
> > > - pdb has no velocities
> > > - gro & g96 can not store information on the element (i.e. can not
> > > distinguish between Calpha and Calcium or Hgamma and Mercury, pdb can
> do
> > > this)
> > > - gro stores non-rectanular boxes in an awkward manner
> > >
> > > I would therefore propose to make better coordinate file format that
> has
> > > - coordinates
> > > - velocities
> > > - box as three edges and three angles (as in pdb file)
> > > - atom name (and number)
> > > - residue name and number
> > > - element type (we could also introduce special elements for united
> > > atoms or course grained particles, but they should not overlap with
> real
> > > elements)
> > > - variable format (no fixed column widths)
> > >
> > > In order to encourage the use of such a more flexible file format I
> > > would then propose that we remove the facility for writing gro and g96
> > > files.
> > >
> > > Please let me know what you think.
> > >
> > > --
> > > David van der Spoel, Ph.D.
> > > Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala
> University.
> > > Box 596, 75124 Uppsala, Sweden. Phone: +46184714205. Fax:
> +4618511755.
> > > spoel at xray.bmc.uu.se spoel at gromacs.org http://folding.bmc.uu.se
> > > _______________________________________________
> > > gmx-developers mailing list
> > > gmx-developers at gromacs.org
> > > http://www.gromacs.org/mailman/listinfo/gmx-developers
> > > Please don't post (un)subscribe requests to the list. Use the
> > > www interface or send it to gmx-developers-request at gromacs.org.
> > >
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > gmx-developers mailing list
> > gmx-developers at gromacs.org
> > http://www.gromacs.org/mailman/listinfo/gmx-developers
> >
> >
> > End of gmx-developers Digest, Vol 48, Issue 1
> > *********************************************
>
>
> Viele Grüsse / Best regards,
> Dr. Mathias Pütz
>
> IT Specialist for Application Performance
>
> Deep Computing - Strategic Growth Business
> IBM Systems & Technology Group
>
> e-mail: mpuetz at de.ibm.com
> mobile: + 49-(0)160-7120602
> fax: + 49-(0)6131-84-6660
>
> Anschrift:
> IBM Deutschland GmbH
> Department B513
> Hechtsheimer Str. 2 / Building 12
> 55131 Mainz
> Germany
>
> IBM Deutschland GmbH
> Vorsitzender des Aufsichtsrats: Hans Ulrich Maerki
> Geschäftsführung: Martin Jetter (Vorsitzender), Christian Diedrich,
> Christoph Grandpierre, Matthias Hartmann, Thomas Fell, Michael Diemer
> Sitz der Gesellschaft: Stuttgart
> Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940
>
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
> ------------
> Erik Lindahl <lindahl at cbr.su.se> Backup: <erik.lindahl at gmail.com>
> Assistant Professor, Computational Structural Biology
> Center for Biomembrane Research, Dept. Biochemistry & Biophysics
> Stockholm University, SE-106 91 Stockholm, Sweden
> Tel: +46(0)8164675 Mobile: +46(0)704218767 Fax: mail a PDF instead
>
>
>
>
>
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
>
More information about the gromacs.org_gmx-developers
mailing list