[gmx-developers] regex.h and boost in gromacs-master

Roland Schulz roland at utk.edu
Thu Mar 15 00:16:13 CET 2012


Hi,


On Wed, Mar 14, 2012 at 5:39 PM, Mirco Wahab <
mirco.wahab at chemie.tu-freiberg.de> wrote:

> There was a short discussion on gerrit (gromacs-master) on how to
> consider regular expressions in selections in future releases,
> eg. here:
>
> https://gerrit.gromacs.org/#/c/551/7/src/gromacs/selection/tests/selectioncollection.cpp
>
> I'm inclined to start a new thread for this ;-) The problem
> here is, in my opinion, what would be the *best package*
> to rely on with the least possible amount of surprises
> in the future.
>
> The (my) [-] candidates:
>
> - PCRE (http://www.pcre.org/) would be just another
>   dependency, so better not ...
>
> - <regex> with Gcc (tr1, C++0x) won't work at all (not even
>   in 4.6.3), see
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#id476343
>
> - <regex> with VS 2010 (tr1) will work with only the
>   most simple expressions, anything moderate complicated
>   will crash it's engine on the initial regex compilation (*)
>
> - <regex.h> is something GNU/GCC specific. A C-library that provides
>   (through regcomp()/regexec()) a basic matching and searching
>   functionality (which might be ok). It is included in the glibc
>   package (under /posix) and might not easily available for
>   Win64 (if at all) .
>
> The (my) [+] candidates:
>
> - Boost
>

I agree with your conclusion that boost would be the best option. But...

>
> Boost is /already/ included somehow in master (for smart_ptr,
> scoped_ptr?), despite the [Allowed C++ Features] ruling:
>     /Don't use Boost, except parts that all developers have
>     agreed to be essential. These parts will be copied to
>     the Gromacs source tree./
>

This is a problem. regex is not a header only library. Thus different from
the current libraries (exception and smart_ptr (shared_ptr, scope_ptr))
which only have headers, regex requires to be compiled. Someone would need
to look into how to best compile it. At least two options exist how to
compile a boost sub-selection included in Gromacs:
- use the standard method of bjam and also ship and autocompile bjam.
Copying bjam is supported by bcp.
- use cmake to compile. Either write the cmake files yourself or use one of
the existing cmake build scripts for boost:http://gitorious.org/boost/cmake
 or http://ryppl.github.com/gettingstarted.html

Currently we use bcp to generate the subset of boost we include
(see src/external/boost/README). With the cmake/boost on gitorious I'm not
sure how to create such a subset. The ryppl based one is supposed to
support this but I'm not sure how to do it.
http://boost.2283326.n4.nabble.com/How-to-use-BCP-td3629743.html has a bit
more detail on the different options and problems. If you could look into
the issue of how to build boost-regex within gromacs that would be great.

BTW: Being able to include linked boost libraries into the included boost
would help us not only help with regex. I think we could benefit greatly
from using Boost::MPI in non-performance critical parts of the code
(e.g. bcast_ir_mtop and global_stat) to improve performance, scaleability
AND maintainability.

Of course we could also not include boost regex into the boost subselection
we include in the Gromacs code. Then the regex part would require boost to
be available.

Roland


> Boost is, imho, the only ubiquitous package that works
> almost perfectly for complicated regexes in unix and
> windows environments. If it can be agreed upon copying
> the regex part into the 'minimal boost tree' of gromacs,
> this problem would have been solved.
>
> There could be, for exotic environments with their own boost
> already in place, some kind of '-with-external-boost' or
> its CMake equivalent.
>
> my 0,02€
>
> Thanks & Regards
>
> M.
>
>
>
>
>
>
> (*) - e.g., this will match against the contents of a gromacs .gro
> file but crashes the VS2010 <regex> engine (but not the Boost one):
>
>    const char * MDATA::reg_gro =
>    /*
>    SOME_NAME
>    1234
>       1  ABC A100    1  44.455  32.113  39.983
>    */
>        "\\A(\\w+)[^\\n\\r]*[\\r\\n]+"
>        "[ ]*(\\d+)[^\\n\\r]*[\\r\\n]+"
>        "[ ]*\\d+"  "[ ]*[-_\\w]+"  "[ ]*[-_\\w]+"  "[ ]*\\d+"  "[
> ]*[\\d\\.]+"  "[ ]*[\\d\\.]+"  "[ ]*[\\d\\.]+"
>     ;
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
>
>
>
>


-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20120314/dba8c31f/attachment.html>


More information about the gromacs.org_gmx-developers mailing list