[gmx-developers] Gromacs trying to use OpenMP instead of thread-mpi despite trying to convince it?

Szilárd Páll szilard.pall at cbr.su.se
Mon Dec 3 20:41:55 CET 2012


On Mon, Dec 3, 2012 at 7:18 PM, Roland Schulz <roland at utk.edu> wrote:

>
>
>
> On Mon, Dec 3, 2012 at 1:02 PM, Szilárd Páll <szilard.pall at cbr.su.se>wrote:
>
>>
>>  On Mon, Dec 3, 2012 at 5:12 AM, Roland Schulz <roland at utk.edu> wrote:
>>
>>>
>>>
>>>
>>>  On Sun, Dec 2, 2012 at 12:58 PM, Shirts, Michael (mrs5pt) <
>>> mrs5pt at eservices.virginia.edu> wrote:
>>>
>>>> So, more progress, but no simulations running yet.
>>>>
>>>> mdrun -nt 8 -ntmpi 8 gives the same error as before (I actually tried
>>>> that
>>>> before, and forgot to include it in my error report)
>>>>
>>>> mdrun -ntmpi 8 -ntomp 1 gives the error
>>>> Fatal error:
>>>> OMP_NUM_THREADS (8) and the number of threads requested on the command
>>>> line
>>>> (1) have different values
>>>>  For more information and tips for troubleshooting, please check the
>>>> GROMACS
>>>> website at http://www.gromacs.org/Documentation/Errors
>>>>
>>>
>>>  We probably should print a notice that OMP_NUM_THREADS is set.
>>> Otherwise this is really confusing if OMP_NUM_THREADS isn't set by the user
>>> but by the system.
>>>
>>
>>  There is a note printed whenever the number of OpenMP threads is set by
>> OMP_NUM_THREADS instead of -ntomp.
>>
> I think if even core developers don't understand/find a note, then that's
> a pretty clear sign that it will be confusing to the average user ;-)
>

Unfortunately this is probably not that case, but rather the case when even
a core developer overlooks an obvious message on the output, something like:
"Getting the number of OpenMP threads from OMP_NUM_THREADS: 6"

This message could be improved, but it is definitely there exactly because
this might be set by default the job scheduler.


>
>>
>>
>>>
>>>
>>>> Fatal error:
>>>> OMP_NUM_THREADS is invalid: '0'
>>>>
>>>
>>>  This is also for ntomp. Also there you wand to use 1 not 0 to disable
>>> OpenMP (1 because it is total number of threads and thus 1 means serial).
>>>
>>>>
>>>> /var/spool/PBS/mom_priv/jobs/2053253.lc5.itc.virginia.edu.SC: line 22:
>>>> 12201
>>>> Illegal instruction     /h3/n1/shirtsgroup/gromac
>>>> s_46/install/bin/mdrun_d -ntomp 1 -ntmpi -8 -deffnm
>>>> /bigtmp/mrs5pt/eth.vrescale.50
>>>>
>>>
>>>  I suppose the CPU on the compute node is different from the build
>>> host. You need to change GMX_CPU_ACCELERATION to the one correct for the
>>> compute node. It could also help to set GMX_DISTRIBUTABLE_BUILD (both are
>>> cmake options).
>>>
>>
>>  GMX_DISTRIBUTABLE_BUILD only does one thing, it turns off rdtscp. Is
>> the intention to provide more features for this option? If not, I don't see
>> the point in not calling it GMX_DISABLE_RDTSCP.
>>
> I called it that for two reasons:
> - Users don't know whether they want to disabled RDTSCP but they might
> know whether they want to have a distributable build. In other words, in
> this case the ultimate goal ("make it work on a different CPU") makes more
> sense to the user then how this is achieved.
> - It can be extended in the furture.
>

Makes sense.


>
> But I think we should consider to disable rdtscp by default. Unless we add
> a runtime detection. The advantage is to small to cause problems. And as
> far as I know the only other option which can cause illegal instruction is
> GMX_CPU_ACCELERATION. But that one is pretty obvious to the user (it is a
> non-advanced cmake option and I think people are much more likely to have
> heard of SSE then rdtscp).
>

I disagree. Many users run on heterogeneous systems or on machine with
shared file-system which is the best way to a binary executed on the wrong
cluster. These users will encounter the "illegal operation" error which to
most of them *will not* indicate that they should change
GMX_CPU_ACCELERATION and recompile. Therefore, I support either trying to
implement a proper solution or simply documenting it in a FAQ.

Cheers,
--
Szilárd


>
> Roland
>
>
>
>>
>>  Cheers,
>> --
>> Szilárd
>>
>>
>>>
>>>  Roland
>>>
>>>
>>>>
>>>> Best,
>>>>
>>>> ~~~~~~~~~~~~
>>>> Michael Shirts
>>>> Assistant Professor
>>>> Department of Chemical Engineering
>>>> University of Virginia
>>>> michael.shirts at virginia.edu
>>>> (434)-243-1821
>>>>
>>>>
>>>>  > From: Berk Hess <hess at kth.se>
>>>> > Date: Sun, 2 Dec 2012 09:34:11 +0100
>>>> > To: "michael.shirts at virginia.edu" <michael.shirts at virginia.edu>,
>>>> Discussion
>>>> > list for GROMACS development <gmx-developers at gromacs.org>
>>>> > Subject: Re: [gmx-developers] Gromacs trying to use OpenMP instead of
>>>> > thread-mpi despite trying to convince it?
>>>>  >
>>>>
>>>> > Hi,
>>>> >
>>>> > You queuing system probably doesn't set OMP_NUM_THREADS then
>>>> > and I assume this machine has at least 16 (HT) cores.
>>>> > mdrun -ntmpi 8 -ntomp 1
>>>> > will do what you want, or:
>>>> > mdrun -nt 8 -ntmpi 8
>>>> >
>>>> > Cheers,
>>>> >
>>>> > Berk
>>>> >
>>>> > On 12/02/2012 08:28 AM, Shirts, Michael (mrs5pt) wrote:
>>>> >> Quick question:
>>>> >>
>>>> >> Compiling the most recent code in release-4-6, I compile without
>>>> OpenMP
>>>> >> (because using group rather than verlet cutoffs), and using any of
>>>> the
>>>> >> below:
>>>> >>
>>>> >> mdrun_d -ntmpi 8 -deffnm ethrun
>>>> >> or
>>>> >> mdrun_d -nt 8 -deffnm ethrun
>>>> >> or
>>>> >> mdrun_d -deffnm ethrun
>>>> >> or
>>>> >> mdrun_d -ntomp 0 -deffnm ethrun
>>>> >> or
>>>> >> mdrun_d -ntomp 0 -ntmpi8 -deffnm ethrun
>>>> >>
>>>> >> I get:
>>>> >> Fatal error:
>>>> >> OpenMP threads are requested, but Gromacs was compiled without OpenMP
>>>> >> support
>>>> >> For more information and tips for troubleshooting, please check the
>>>> GROMACS
>>>> >> website at http://www.gromacs.org/Documentation/Errors
>>>> >>
>>>> >> Even though I'm presumably requesting thread-mpi.  Worked fine with
>>>> -nt
>>>> >> previously (before the new -nt options introduced a few months back).
>>>> >>
>>>> >> Any suggestions or something I'm doing wrong?  Perhaps gromacs is
>>>> >> interpreting the cluster environment as requesting OpenMP somehow?
>>>> FWIW, the
>>>> >> PBS script request line is "#PBS -l select=1:mpiprocs=8:ncpus=8".
>>>> >>
>>>> >> Apologies if I missed this answers somewhere out there already.
>>>> >>
>>>> >> Thanks,
>>>> >> ~~~~~~~~~~~~
>>>> >> Michael Shirts
>>>> >> Assistant Professor
>>>> >> Department of Chemical Engineering
>>>> >> University of Virginia
>>>> >> michael.shirts at virginia.edu
>>>> >> (434)-243-1821
>>>> >>
>>>> >
>>>>
>>>> --
>>>> gmx-developers mailing list
>>>> gmx-developers at gromacs.org
>>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>>> Please don't post (un)subscribe requests to the list. Use the
>>>> www interface or send it to gmx-developers-request at gromacs.org.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>  --
>>> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>>> 865-241-1537, ORNL PO BOX 2008 MS6309
>>>
>>> --
>>> gmx-developers mailing list
>>> gmx-developers at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>> Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it to gmx-developers-request at gromacs.org.
>>>
>>
>>
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20121203/39a17141/attachment.html>


More information about the gromacs.org_gmx-developers mailing list