[gmx-users] running g_tune_pme on stampede

Kevin Chen fch6699 at gmail.com
Sat Dec 6 00:16:09 CET 2014


Hi,

Has anybody tried g_tune_pme on stampede before? It appears stampede only support ibrun, but not mpi -np type of stuff. So I assume one could launch g_tune_pme with mpi using command like this (without -np option),

Ibrun g_tune_pme -s cutoff.tpr -launch

Unfortunately, it failed. Any suggestion is welcome!

Thanks in advance

Kevin Chen






-----Original Message-----
From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se [mailto:gromacs.org_gmx-users-bounces at maillist.sys.kth.se] On Behalf Of Szilárd Páll
Sent: Friday, December 5, 2014 12:54 PM
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] multinode issue

On a second thought (and a quick googling), it _seems_ that this is an issue caused by the following:
- the OpenMP runtime gets initialized outside mdrun and its threads (or just the master thread), get their affinity set;
- mdrun then executes the sanity check, point at which omp_get_num_procs(), reports 1 CPU most probably because the master thread is bound to a single core.

This alone should not be a big deal as long as the affinity settings get correctly overridden in mdrun. However this can have the ugly side-effect that, if mdrun's affinity setting gets disabled (if mdrun detects the externally set affinities it back off or if not all cores/hardware threads are used), all compute threads will inherit the affinity set previously and multiple threads will run on a the same core.

Note that this warning should typically not cause a crash, but it is telling you that something is not quite right, so it may be best to start with eliminating this warning (hints: I_MPI_PIN for Intel MPI, -cc for Cray's aprun, --cpu-bind for slurm).

Cheers,
--
Szilárd


On Fri, Dec 5, 2014 at 7:35 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
> I don't think this is a sysconf issue. As you seem to have 16-core (hw
> thread?) nodes, it looks like sysnconf returned the correct value 
> (16), but the OpenMP runtime actually returned 1. This typically means 
> that the OpenMP runtime was initialized outside mdrun and for some 
> reason (which I'm not sure about) it returns 1.
>
> My guess is that your job scheduler is multi-threading aware and by 
> default assumes 1 core/hardware thread per rank so you may want to set 
> some rank depth/width option.
>
> --
> Szilárd
>
>
> On Fri, Dec 5, 2014 at 1:37 PM, Éric Germaneau <germaneau at sjtu.edu.cn> wrote:
>> Thank you Mark,
>>
>> Yes this was the end of the log.
>> I tried an other input and got the same issue:
>>
>>    Number of CPUs detected (16) does not match the number reported by
>>    OpenMP (1).
>>    Consider setting the launch configuration manually!
>>    Reading file yukuntest-70K.tpr, VERSION 4.6.3 (single precision)
>>    [16:node328] unexpected disconnect completion event from [0:node299]
>>    Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
>>    internal ABORT - process 16
>>
>> Actually, I'm running some test for our users, I'll talk with the 
>> admin about how to  return information to the standard sysconf() 
>> routine in the usual way.
>> Thank you,
>>
>>            Éric.
>>
>>
>> On 12/05/2014 07:38 PM, Mark Abraham wrote:
>>>
>>> On Fri, Dec 5, 2014 at 9:15 AM, Éric Germaneau 
>>> <germaneau at sjtu.edu.cn>
>>> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I use impi and when I submit o job (via LSF) to more than one node 
>>>> I get the following message:
>>>>
>>>>     Number of CPUs detected (16) does not match the number reported by
>>>>     OpenMP (1).
>>>>
>>> That suggests this machine has not be set up to return information 
>>> to the standard sysconf() routine in the usual way. What kind of machine is this?
>>>
>>>     Consider setting the launch configuration manually!
>>>>
>>>>     Reading file test184000atoms_verlet.tpr, VERSION 4.6.2 (single
>>>>     precision)
>>>>
>>> I hope that's just a 4.6.2-era .tpr, but nobody should be using 
>>> 4.6.2 mdrun because there was a bug in only that version affecting 
>>> precisely these kinds of issues...
>>>
>>>     [16:node319] unexpected disconnect completion event from 
>>> [11:node328]
>>>>
>>>>     Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
>>>>     internal ABORT - process 16
>>>>
>>>> I submit doing
>>>>
>>>>     mpirun -np 32 -machinefile nodelist $EXE -v -deffnm $INPUT
>>>>
>>>> The machinefile looks like this
>>>>
>>>>     node328:16
>>>>     node319:16
>>>>
>>>> I'm running the release 4.6.7.
>>>> I do not set anything about OpenMP for this job, I'd like to have 
>>>> 32 MPI process.
>>>>
>>>> Using one node it works fine.
>>>> Any hints here?
>>>>
>>> Everything seems fine. What was the end of the .log file? Can you 
>>> run another MPI test program thus?
>>>
>>> Mark
>>>
>>>
>>>>                                                               Éric.
>>>>
>>>> --
>>>> Éric Germaneau (???), Specialist
>>>> Center for High Performance Computing Shanghai Jiao Tong University 
>>>> Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China 
>>>> M:germaneau at sjtu.edu.cn P:+86-136-4161-6480 
>>>> W:http://hpc.sjtu.edu.cn
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at http://www.gromacs.org/ 
>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users 
>>>> or send a mail to gmx-users-request at gromacs.org.
>>>>
>>
>> --
>> Éric Germaneau (???), Specialist
>> Center for High Performance Computing Shanghai Jiao Tong University 
>> Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China 
>> Email:germaneau at sjtu.edu.cn Mobi:+86-136-4161-6480 
>> http://hpc.sjtu.edu.cn
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or 
>> send a mail to gmx-users-request at gromacs.org.
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.



More information about the gromacs.org_gmx-users mailing list