[gmx-users] running g_tune_pme on stampede

Carsten Kutzner ckutzne at gwdg.de
Sat Dec 6 10:18:34 CET 2014


On 06 Dec 2014, at 00:16, Kevin Chen <fch6699 at gmail.com> wrote:

> Hi,
> 
> Has anybody tried g_tune_pme on stampede before? It appears stampede only support ibrun, but not mpi -np type of stuff. So I assume one could launch g_tune_pme with mpi using command like this (without -np option),
> 
> Ibrun g_tune_pme -s cutoff.tpr -launch
Try 

export MPIRUN=ibrun
export MDRUN=$( which mdrun)
g_tune_pme -s …

Carsten

> 
> Unfortunately, it failed. Any suggestion is welcome!
> 
> Thanks in advance
> 
> Kevin Chen
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se [mailto:gromacs.org_gmx-users-bounces at maillist.sys.kth.se] On Behalf Of Szilárd Páll
> Sent: Friday, December 5, 2014 12:54 PM
> To: Discussion list for GROMACS users
> Subject: Re: [gmx-users] multinode issue
> 
> On a second thought (and a quick googling), it _seems_ that this is an issue caused by the following:
> - the OpenMP runtime gets initialized outside mdrun and its threads (or just the master thread), get their affinity set;
> - mdrun then executes the sanity check, point at which omp_get_num_procs(), reports 1 CPU most probably because the master thread is bound to a single core.
> 
> This alone should not be a big deal as long as the affinity settings get correctly overridden in mdrun. However this can have the ugly side-effect that, if mdrun's affinity setting gets disabled (if mdrun detects the externally set affinities it back off or if not all cores/hardware threads are used), all compute threads will inherit the affinity set previously and multiple threads will run on a the same core.
> 
> Note that this warning should typically not cause a crash, but it is telling you that something is not quite right, so it may be best to start with eliminating this warning (hints: I_MPI_PIN for Intel MPI, -cc for Cray's aprun, --cpu-bind for slurm).
> 
> Cheers,
> --
> Szilárd
> 
> 
> On Fri, Dec 5, 2014 at 7:35 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
>> I don't think this is a sysconf issue. As you seem to have 16-core (hw
>> thread?) nodes, it looks like sysnconf returned the correct value 
>> (16), but the OpenMP runtime actually returned 1. This typically means 
>> that the OpenMP runtime was initialized outside mdrun and for some 
>> reason (which I'm not sure about) it returns 1.
>> 
>> My guess is that your job scheduler is multi-threading aware and by 
>> default assumes 1 core/hardware thread per rank so you may want to set 
>> some rank depth/width option.
>> 
>> --
>> Szilárd
>> 
>> 
>> On Fri, Dec 5, 2014 at 1:37 PM, Éric Germaneau <germaneau at sjtu.edu.cn> wrote:
>>> Thank you Mark,
>>> 
>>> Yes this was the end of the log.
>>> I tried an other input and got the same issue:
>>> 
>>>   Number of CPUs detected (16) does not match the number reported by
>>>   OpenMP (1).
>>>   Consider setting the launch configuration manually!
>>>   Reading file yukuntest-70K.tpr, VERSION 4.6.3 (single precision)
>>>   [16:node328] unexpected disconnect completion event from [0:node299]
>>>   Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
>>>   internal ABORT - process 16
>>> 
>>> Actually, I'm running some test for our users, I'll talk with the 
>>> admin about how to  return information to the standard sysconf() 
>>> routine in the usual way.
>>> Thank you,
>>> 
>>>           Éric.
>>> 
>>> 
>>> On 12/05/2014 07:38 PM, Mark Abraham wrote:
>>>> 
>>>> On Fri, Dec 5, 2014 at 9:15 AM, Éric Germaneau 
>>>> <germaneau at sjtu.edu.cn>
>>>> wrote:
>>>> 
>>>>> Dear all,
>>>>> 
>>>>> I use impi and when I submit o job (via LSF) to more than one node 
>>>>> I get the following message:
>>>>> 
>>>>>    Number of CPUs detected (16) does not match the number reported by
>>>>>    OpenMP (1).
>>>>> 
>>>> That suggests this machine has not be set up to return information 
>>>> to the standard sysconf() routine in the usual way. What kind of machine is this?
>>>> 
>>>>    Consider setting the launch configuration manually!
>>>>> 
>>>>>    Reading file test184000atoms_verlet.tpr, VERSION 4.6.2 (single
>>>>>    precision)
>>>>> 
>>>> I hope that's just a 4.6.2-era .tpr, but nobody should be using 
>>>> 4.6.2 mdrun because there was a bug in only that version affecting 
>>>> precisely these kinds of issues...
>>>> 
>>>>    [16:node319] unexpected disconnect completion event from 
>>>> [11:node328]
>>>>> 
>>>>>    Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
>>>>>    internal ABORT - process 16
>>>>> 
>>>>> I submit doing
>>>>> 
>>>>>    mpirun -np 32 -machinefile nodelist $EXE -v -deffnm $INPUT
>>>>> 
>>>>> The machinefile looks like this
>>>>> 
>>>>>    node328:16
>>>>>    node319:16
>>>>> 
>>>>> I'm running the release 4.6.7.
>>>>> I do not set anything about OpenMP for this job, I'd like to have 
>>>>> 32 MPI process.
>>>>> 
>>>>> Using one node it works fine.
>>>>> Any hints here?
>>>>> 
>>>> Everything seems fine. What was the end of the .log file? Can you 
>>>> run another MPI test program thus?
>>>> 
>>>> Mark
>>>> 
>>>> 
>>>>>                                                              Éric.
>>>>> 
>>>>> --
>>>>> Éric Germaneau (???), Specialist
>>>>> Center for High Performance Computing Shanghai Jiao Tong University 
>>>>> Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China 
>>>>> M:germaneau at sjtu.edu.cn P:+86-136-4161-6480 
>>>>> W:http://hpc.sjtu.edu.cn
>>>>> --
>>>>> Gromacs Users mailing list
>>>>> 
>>>>> * Please search the archive at http://www.gromacs.org/ 
>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>> 
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>> 
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users 
>>>>> or send a mail to gmx-users-request at gromacs.org.
>>>>> 
>>> 
>>> --
>>> Éric Germaneau (???), Specialist
>>> Center for High Performance Computing Shanghai Jiao Tong University 
>>> Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China 
>>> Email:germaneau at sjtu.edu.cn Mobi:+86-136-4161-6480 
>>> http://hpc.sjtu.edu.cn
>>> --
>>> Gromacs Users mailing list
>>> 
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>> 
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> 
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or 
>>> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa



More information about the gromacs.org_gmx-users mailing list