[gmx-developers] parallel make problems

Szilárd Páll szilard.pall at cbr.su.se
Tue Jun 18 23:40:42 CEST 2013


On Tue, Jun 18, 2013 at 5:05 PM, Mark Abraham <mark.j.abraham at gmail.com> wrote:
>
>
>
> On Mon, Jun 17, 2013 at 7:59 PM, Roland Schulz <roland at utk.edu> wrote:
>>
>> On Mon, Jun 17, 2013 at 1:10 PM, Mark Abraham <mark.j.abraham at gmail.com>
>> wrote:
>> >
>> >
>> >
>> > On Mon, Jun 17, 2013 at 6:16 PM, Manuel Nuno Melo <m.n.melo at rug.nl>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have also had linking problems when making in parallel. In my case
>> >> they
>> >> could be traced back to the option to let GMX download/build its own
>> >> fftw
>> >> (-DGMX_BUILD_OWN_FFTW=ON).
>> >>
>> >> It seems that only one of make's threads starts building fftw, while
>> >> the
>> >> others go ahead building/linking GMX. Since fftw compilation is not
>> >> ready by
>> >> the time it is needed, GMX linking is botched.
>> >
>> >
>> > Yes, Rossen first showed this to me. I don't know if the underlying
>> > issue is
>> > that the dependency cannot be described properly, or that we're not
>> > doing it
>> > properly. If it's a problem, people are welcome to contribute a fix! :-)
>> It was working in https://gerrit.gromacs.org/#/c/1675/12. You then
>> changed how the dependency works in patch set 13. You never replied to
>> Christophs comment why this was changed (at least I can't find a
>> reply). Do you remember?
>
>
> I couldn't remember, but gerrit can - I never published a series of
> responses I made back then, sorry. Now published at
> https://gerrit.gromacs.org/1675
>
>> Otherwise I can change it back as 12 did it
>> and it should work again.
>
>
> It might do, but as I said in those secret drafts the form of patch 12
> doesn't work on cmake 2.8.7 because of a bug there in add_library(...GLOBAL)
> (and I suspect is probably too global, anyway, but this probably does no
> harm?).
>
> So I'm still not sure there's a convenient solution that works in all cases.
> Compromising the smooth running of a parallel make for someone downloading
> FFTW seems like the most low-impact problem of the set we could choose to
> have.

If no proper fix is possible, it may still be worth adding a note that
warns users about the potential linking issue and suggests a solution
(which should be a simple as re-running make, right?).

--
Szilárd

>
> Mark
>
>>
>> Roland
>>
>> >
>> > Mark
>> >
>> >>
>> >>
>> >> Cheers,
>> >> Manel
>> >>
>> >> > Hi,
>> >> >
>> >> > I too suspect filesystem issues or clock skews. I think I tested make
>> >> > -j
>> >> > and make -j 12. The cluster is currently down for maintenance, so I
>> >> > can't
>> >> > inspect the details at the moment.
>> >> >
>> >> > On 5 Apr 2013, at 13:14, Alexey Shvetsov <alexxy at omrb.pnpi.spb.ru>
>> >> > wrote:
>> >> >
>> >> > > Hi Erik
>> >> > >
>> >> > > What are underlaying filesystem on this cluster? If it slow or
>> >> > > overloaded
>> >> > > somehow it may lead to parallel make issues. Also it may be related
>> >> > > to
>> >> > > make
>> >> > > version (some old versions may expose such behavior). How many make
>> >> > > threads do
>> >> > > you issued? I tryed with make -j64 and it builds fine with recent
>> >> > > cmake
>> >> > > (2.8.10) and make (3.82) utility.
>> >> > >
>> >> > >
>> >> > > В письме от 5 апреля 2013 11:55:27 пользователь Erik Marklund
>> >> > > написал:
>> >> > >> Hi,
>> >> > >>
>> >> > >> Building gromacs 4.6.1 failed whenever I issued parallel make,
>> >> > >> i.e.
>> >> > >> make -j.
>> >> > >> I reported this to the cluster admins since I had never seen such
>> >> > >> behaviour
>> >> > >> before from gromacs' side, and here's their reply. I can't tell
>> >> > >> whether
>> >> > >> gromacs is at fault or the cluster.
>> >> > >>
>> >> > >> Erik
>> >> > >>
>> >> > >> Begin forwarded message:
>> >> > >>> Hi,
>> >> > >>>
>> >> > >>>> I was compiling gromacs on tintin's login node the other day and
>> >> > >>>> it
>> >> > >>>>
>> >> > >>>>  seems that parallel make, i.e. make -j, doesn't work on tintin.
>> >> > >>>> I
>> >> > >>>>  got linker errors that never showed up when make was run
>> >> > >>>> serially.
>> >> > >>>>  I've never encountered such behaviour before.
>> >> > >>>
>> >> > >>> Without any more information (or being able to look for actual
>> >> > >>> files
>> >> > >>> right
>> >> > >>> now), I'd guess this is a problem with the makefiles rather than
>> >> > >>> the
>> >> > >>> actual make. It seems somewhat unexpected that CMake would create
>> >> > >>> makefiles that aren't safe for parallel building, but it does
>> >> > >>> seem
>> >> > >>> the
>> >> > >>> most likely culprit (assuming it doesn't let developers add rules
>> >> > >>> directly to the makefile to work around problems, I don't
>> >> > >>> remember
>> >> > >>> if
>> >> > >>> that's possible).
>> >> > >>>
>> >> > >>> That you only see the problem on tintin can likely be explained
>> >> > >>> by
>> >> > >>> timing
>> >> > >>> or other non deterministic factors.
>> >> > > --
>> >> > > Best Regards,
>> >> > > Alexey 'Alexxy' Shvetsov
>> >>
>> >> --
>> >> gmx-developers mailing list
>> >> gmx-developers at gromacs.org
>> >> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> >> Please don't post (un)subscribe requests to the list. Use the
>> >> www interface or send it to gmx-developers-request at gromacs.org.
>> >
>> >
>>
>>
>>
>> --
>> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>> 865-241-1537, ORNL PO BOX 2008 MS6309
>> --
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
>
>
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.



More information about the gromacs.org_gmx-developers mailing list