[gmx-developers] thread problem on AMD 12 core chips?

Sander Pronk pronk at cbr.su.se
Fri Jan 13 11:23:31 CET 2012


This error is generated when the thread library can't create threads for some reason (out of memory, or some ulimit; I've never seen it before). It is probably due to the OS. 

There might be a chance that this is due to thread affinity API incompatibility: if the number of threads is equal to the number of hardware threads (cores, etc), thread_mpi will enforce thread affinity. 

Could you try with:

mdrun -nt 30

(or whichever number other than 32 that is compatible with domain decomposition)

and report whether that works?

Sander


On 12 Jan 2012, at 14:39 , David van der Spoel wrote:

> On 2012-01-12 13:31, Ake Sandgren wrote:
>> On Thu, 2012-01-12 at 13:18 +0100, David van der Spoel wrote:
>>> On 2012-01-12 11:36, Berk Hess wrote:
>>>> On 01/12/2012 11:24 AM, David van der Spoel wrote:
>>>>> On 2012-01-12 11:17, Berk Hess wrote:
>>>>>> Which compiler is this?
>>>>>> We get lots of warnings with gcc4.6, but we run regularly on 32 and 64
>>>>>> core nodes.
>>>>> 
>>>>> Thread model: posix
>>>>> gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
>>>>> 
>>>>> Is intel more reliable?
>>>> This is not a matter of platform I would think.
>>>> We have only used AMD platform with 32 or more MPI threads.
>>>> I would guess this is a thread-mpi bug or a compiler issue.
>>> Compiling with Intel C 12.0.3.174 gives the same error, but the same
>>> pthread library is linked in.
>>> 
>>> Other tips for debugging this?
>>> 
>>>> 
>>>> Berk
>>>>>> 
>>>>>> Berk
>>>>>> 
>>>>>> On 01/12/2012 11:04 AM, David van der Spoel wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm trying to compile and install gromacs release-4-5-patches on a new
>>>>>>> cluster with four 12-core AMD chips (abisko in Umea, Sweden). However
>>>>>>> the threaded code bails out with the following message:
>>>>>>> 
>>>>>>> Reading file topol.tpr, VERSION 4.5.5-dev-20120111-9181e (double
>>>>>>> precision)
>>>>>>> Starting 32 threads
>>>>>>> tMPI error: tMPI Initialization error (in valid comm)
>>>>>>> 
>>>>>>> First, I'm a bit confused why the code detects only 32 cores, second
>>>>>>> it shows above error and quits.
>>>>>>> 
>>>>>>> Any clues?
>> 
>> 
>> Abisko's current nodes are 4-socket 8-core (the 12-cores are still under
>> test)
>> 
>> If you are using openmpi it does not have support for MPI threads
>> compiled in (the openib part of openmpi doesn't support this yet) that
>> probably explains your problem.
>> 
> I see, that explains the 32. But gromacs uses it's own mpi-over-threads implementation that does not use any MPI whatsoever.
> 
> 
> -- 
> David van der Spoel, Ph.D., Professor of Biology
> Dept. of Cell & Molec. Biol., Uppsala University.
> Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
> spoel at xray.bmc.uu.se    http://folding.bmc.uu.se
> -- 
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-developers-request at gromacs.org.




More information about the gromacs.org_gmx-developers mailing list