[gmx-users] MPI oversubscription

Christian H. hypolit at googlemail.com
Wed Feb 6 09:38:09 CET 2013


And if the zip file did not get through: http://de.pastebin.ca/2311103

2013/2/6 Christian H. <hypolit at googlemail.com>

> Seems like my last message got lost because the CMakeError.log was too
> big, I attached that as a zip file.
>
>
>
> If i call sysconf() without any arguments it returns 1.
> Sysconf(_SC_NPROCESSOR_ONLN) complains that _SC_NPROCESSOR_ONLN is not
> defined.
>
> Also this comes up during make, which looks like trouble:
> src/gromacs/gmxlib/gmx_detect_
> hardware.c: In function ‘get_nthreads_hw_avail’:
> src/gromacs/gmxlib/gmx_detect_hardware.c:414:1: warning: implicit
> declaration of function ‘sysconf’ [-Wimplicit-function-declaration]
>
> I built gromacs as described in:
>
> http://www.gromacs.org/Documentation/Installation_Instructions/Cmake#MPI_build
> with
>
> cmake -DGMX_MPI=ON ../gromacs-src
> make -j 8
>
> I did not set anything else.
>
>
> 2013/2/5 Roland Schulz <roland at utk.edu>
>>
>>> On Tue, Feb 5, 2013 at 8:58 AM, Berk Hess <gmx3 at hotmail.com> wrote:
>>>
>>> >
>>> > One last thing:
>>> > Maybe a macro is not set, but we can actually query the number of
>>> > processors.
>>> > Could you replace the conditional that gets triggered on my machine:
>>> > #if defined(_SC_NPROCESSORS_ONLN)
>>> > to
>>> > #if 1
>>> >
>>> > So we can check if the actual sysconf call works or not?
>>> >
>>> > My workaround won't work without OpenMP.
>>> > Did you disable that manually?
>>> >
>>> > Also large file support is not turned on.
>>> > It seems like your build setup is somehow messed up and lot of features
>>> > are not found.
>>> >
>>>
>>> Could you post your CMakeFiles/CMakeError.log? That should show why those
>>> features are disabled.
>>>
>>> Roland
>>>
>>>
>>> >
>>> > Cheers,
>>> >
>>> > Berk
>>> >
>>> >
>>> > ----------------------------------------
>>> > > Date: Tue, 5 Feb 2013 14:52:17 +0100
>>> > > Subject: Re: [gmx-users] MPI oversubscription
>>> > > From: hypolit at googlemail.com
>>> > > To: gmx-users at gromacs.org
>>> > >
>>> > > Head of .log:
>>> > >
>>> > > Gromacs version: VERSION 5.0-dev-20121213-e1fcb0a-dirty
>>> > > GIT SHA1 hash: e1fcb0a3d2768a8bb28c2e4e8012123ce773e18c (dirty)
>>> > > Precision: single
>>> > > MPI library: MPI
>>> > > OpenMP support: disabled
>>> > > GPU support: disabled
>>> > > invsqrt routine: gmx_software_invsqrt(x)
>>> > > CPU acceleration: AVX_256
>>> > > FFT library: fftw-3.3.2-sse2
>>> > > Large file support: disabled
>>> > > RDTSCP usage: enabled
>>> > > Built on: Tue Feb 5 10:58:32 CET 2013
>>> > > Built by: christian at k [CMAKE]
>>> > > Build OS/arch: Linux 3.4.11-2.16-desktop x86_64
>>> > > Build CPU vendor: GenuineIntel
>>> > > Build CPU brand: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
>>> > > Build CPU family: 6 Model: 42 Stepping: 7
>>> > > Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx
>>> msr
>>> > > nonstop_tsc pcid pclmuldq pdcm popcnt pse rdtscp sse2 sse3 sse4.1
>>> sse4.2
>>> > > ssse3 tdt
>>> > > C compiler: /home/christian/opt/bin/mpicc GNU gcc (GCC) 4.8.0
>>> > > 20120618 (experimental)
>>> > > C compiler flags: -mavx -Wextra -Wno-missing-field-initializers
>>> > > -Wno-sign-compare -Wall -Wno-unused -Wunused-value
>>> -Wno-unknown-pragmas
>>> > > -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -O3
>>> > > -DNDEBUG
>>> > > C++ compiler: /home/christian/opt/bin/mpiCC GNU g++ (GCC) 4.8.0
>>> > > 20120618 (experimental)
>>> > > C++ compiler flags: -mavx -std=c++0x -Wextra
>>> > > -Wno-missing-field-initializers -Wnon-virtual-dtor -Wall -Wno-unused
>>> > > -Wunused-value -Wno-unknown-pragmas -fomit-frame-pointer
>>> > > -funroll-all-loops -fexcess-precision=fast -O3 -DNDEBUG
>>> > >
>>> > > I will try your workaround, thanks!
>>> > >
>>> > > 2013/2/5 Berk Hess <gmx3 at hotmail.com>
>>> > >
>>> > > >
>>> > > > OK, then this is an unhandled case.
>>> > > > Strange, because I am also running OpenSUSE 12.2 with the same
>>> CPU, but
>>> > > > use gcc 4.7.1.
>>> > > >
>>> > > > I will file a bug report on redmine.
>>> > > > Could you also post the header of md.log which gives all
>>> configuration
>>> > > > information?
>>> > > >
>>> > > > To make it work for now, you can insert immediately after #ifdef
>>> > > > GMX_OMPENMP:
>>> > > > if (ret <= 0)
>>> > > > {
>>> > > > ret = gmx_omp_get_num_procs();
>>> > > > }
>>> > > >
>>> > > >
>>> > > > Cheers,
>>> > > >
>>> > > > Berk
>>> > > >
>>> > > > ----------------------------------------
>>> > > > > Date: Tue, 5 Feb 2013 14:27:44 +0100
>>> > > > > Subject: Re: [gmx-users] MPI oversubscription
>>> > > > > From: hypolit at googlemail.com
>>> > > > > To: gmx-users at gromacs.org
>>> > > > >
>>> > > > > None of the variables referenced here are set on my system, the
>>> print
>>> > > > > statements are never executed.
>>> > > > >
>>> > > > > What I did:
>>> > > > >
>>> > > > > printf("Checking which processor variable is set");
>>> > > > > #if defined(_SC_NPROCESSORS_ONLN)
>>> > > > > ret = sysconf(_SC_NPROCESSORS_ONLN);
>>> > > > > printf("case 1 ret = %d\n",ret);
>>> > > > > #elif defined(_SC_NPROC_ONLN)
>>> > > > > ret = sysconf(_SC_NPROC_ONLN);
>>> > > > > printf("case 2 ret = %d\n",ret);
>>> > > > > #elif defined(_SC_NPROCESSORS_CONF)
>>> > > > > ret = sysconf(_SC_NPROCESSORS_CONF);
>>> > > > > printf("case 3 ret = %d\n",ret);
>>> > > > > #elif defined(_SC_NPROC_CONF)
>>> > > > > ret = sysconf(_SC_NPROC_CONF);
>>> > > > > printf("case 4 ret = %d\n",ret);
>>> > > > > #endif /* End of check for sysconf argument values */
>>> > > > >
>>> > > > > >From /etc/issue:
>>> > > > > Welcome to openSUSE 12.2 "Mantis" - Kernel \r (\l)
>>> > > > > >From uname -a:
>>> > > > > Linux kafka 3.4.11-2.16-desktop #1 SMP PREEMPT Wed Sep 26
>>> 17:05:00
>>> > UTC
>>> > > > 2012
>>> > > > > (259fc87) x86_64 x86_64 x86_64 GNU/Linux
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > 2013/2/5 Berk Hess <gmx3 at hotmail.com>
>>> > > > >
>>> > > > > >
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > This is the same cpu I have in my workstation and this case
>>> should
>>> > not
>>> > > > > > cause any problems.
>>> > > > > >
>>> > > > > > Which operating system and version are you using?
>>> > > > > >
>>> > > > > > If you know a bit about programming, could you check what goes
>>> > wrong in
>>> > > > > > get_nthreads_hw_avail
>>> > > > > > in src/gmxlib/gmx_detect_hardware.c ?
>>> > > > > > Add after the four "ret =" at line 434, 436, 438 and 440:
>>> > > > > > printf("case 1 ret = %d\n",ret);
>>> > > > > > and replace 1 by different numbers.
>>> > > > > > Thus you can check if one of the 4 cases returns 0 or none of
>>> the
>>> > cases
>>> > > > > > is called.
>>> > > > > >
>>> > > > > > Cheers,
>>> > > > > >
>>> > > > > > Berk
>>> > > > > >
>>> > > > > >
>>> > > > > > ----------------------------------------
>>> > > > > > > Date: Tue, 5 Feb 2013 13:45:02 +0100
>>> > > > > > > Subject: Re: [gmx-users] MPI oversubscription
>>> > > > > > > From: hypolit at googlemail.com
>>> > > > > > > To: gmx-users at gromacs.org
>>> > > > > > >
>>> > > > > > > >From the .log file:
>>> > > > > > >
>>> > > > > > > Present hardware specification:
>>> > > > > > > Vendor: GenuineIntel
>>> > > > > > > Brand: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
>>> > > > > > > Family: 6 Model: 42 Stepping: 7
>>> > > > > > > Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx
>>> msr
>>> > > > > > nonstop_tsc
>>> > > > > > > pcid pclmuldq pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2
>>> > ssse3
>>> > > > tdt
>>> > > > > > > Acceleration most likely to fit this hardware: AVX_256
>>> > > > > > > Acceleration selected at GROMACS compile time: AVX_256
>>> > > > > > >
>>> > > > > > > Table routines are used for coulomb: FALSE
>>> > > > > > > Table routines are used for vdw: FALSE
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > >From /proc/cpuinfo (8 entries like this in total):
>>> > > > > > >
>>> > > > > > > processor : 0
>>> > > > > > > vendor_id : GenuineIntel
>>> > > > > > > cpu family : 6
>>> > > > > > > model : 42
>>> > > > > > > model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
>>> > > > > > > stepping : 7
>>> > > > > > > microcode : 0x28
>>> > > > > > > cpu MHz : 1600.000
>>> > > > > > > cache size : 8192 KB
>>> > > > > > > physical id : 0
>>> > > > > > > siblings : 8
>>> > > > > > > core id : 0
>>> > > > > > > cpu cores : 4
>>> > > > > > > apicid : 0
>>> > > > > > > initial apicid : 0
>>> > > > > > > fpu : yes
>>> > > > > > > fpu_exception : yes
>>> > > > > > > cpuid level : 13
>>> > > > > > > wp : yes
>>> > > > > > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>>> mca
>>> > > > > > > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
>>> pbe
>>> > > > syscall nx
>>> > > > > > > rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
>>> > xtopology
>>> > > > > > > nonstop_tsc aperfmper
>>> > > > > > > f pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
>>> xtpr
>>> > > > pdcm
>>> > > > > > pcid
>>> > > > > > > sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx
>>> lahf_lm ida
>>> > > > arat
>>> > > > > > epb
>>> > > > > > > xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
>>> > > > > > > bogomips : 6784.04
>>> > > > > > > clflush size : 64
>>> > > > > > > cache_alignment : 64
>>> > > > > > > address sizes : 36 bits physical, 48 bits virtual
>>> > > > > > > power management:
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > It also does not work on the local cluster, the output in the
>>> > .log
>>> > > > file
>>> > > > > > is:
>>> > > > > > >
>>> > > > > > > Detecting CPU-specific acceleration.
>>> > > > > > > Present hardware specification:
>>> > > > > > > Vendor: AuthenticAMD
>>> > > > > > > Brand: AMD Opteron(TM) Processor 6220
>>> > > > > > > Family: 21 Model: 1 Stepping: 2
>>> > > > > > > Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm
>>> > > > misalignsse
>>> > > > > > mmx
>>> > > > > > > msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3
>>> > sse4a
>>> > > > sse4.1
>>> > > > > > > sse4.2 ssse3 xop
>>> > > > > > > Acceleration most likely to fit this hardware: AVX_128_FMA
>>> > > > > > > Acceleration selected at GROMACS compile time: AVX_128_FMA
>>> > > > > > > Table routines are used for coulomb: FALSE
>>> > > > > > > Table routines are used for vdw: FALSE
>>> > > > > > >
>>> > > > > > > I am not too sure about the details for that setup, but the
>>> brand
>>> > > > looks
>>> > > > > > > about right.
>>> > > > > > > Do you need any other information?
>>> > > > > > > Thanks for looking into it!
>>> > > > > > >
>>> > > > > > > 2013/2/5 Berk Hess <gmx3 at hotmail.com>
>>> > > > > > >
>>> > > > > > > >
>>> > > > > > > > Hi,
>>> > > > > > > >
>>> > > > > > > > This looks like our CPU detection code failed and the
>>> result
>>> > is not
>>> > > > > > > > handled properly.
>>> > > > > > > >
>>> > > > > > > > What hardware are you running on?
>>> > > > > > > > Could you mail the 10 lines from the md.log file following:
>>> > > > "Detecting
>>> > > > > > > > CPU-specific acceleration."?
>>> > > > > > > >
>>> > > > > > > > Cheers,
>>> > > > > > > >
>>> > > > > > > > Berk
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > ----------------------------------------
>>> > > > > > > > > Date: Tue, 5 Feb 2013 11:38:53 +0100
>>> > > > > > > > > From: hypolit at googlemail.com
>>> > > > > > > > > To: gmx-users at gromacs.org
>>> > > > > > > > > Subject: [gmx-users] MPI oversubscription
>>> > > > > > > > >
>>> > > > > > > > > Hi,
>>> > > > > > > > >
>>> > > > > > > > > I am using the latest git version of gromacs, compiled
>>> with
>>> > gcc
>>> > > > > > 4.6.2 and
>>> > > > > > > > > openmpi 1.6.3.
>>> > > > > > > > > I start the program using the usual mpirun -np 8
>>> mdrun_mpi
>>> > ...
>>> > > > > > > > > This always leads to a warning:
>>> > > > > > > > >
>>> > > > > > > > > Using 1 MPI process
>>> > > > > > > > > WARNING: On node 0: oversubscribing the available 0
>>> logical
>>> > CPU
>>> > > > > > cores per
>>> > > > > > > > > node with 1 MPI processes.
>>> > > > > > > > >
>>> > > > > > > > > Checking the processes confirms that there is only one of
>>> > the 8
>>> > > > > > available
>>> > > > > > > > > cores used.
>>> > > > > > > > > Running mdrun_mpi with an additional debug -1:
>>> > > > > > > > >
>>> > > > > > > > > Detected 0 processors, will use this as the number of
>>> > supported
>>> > > > > > hardware
>>> > > > > > > > > threads.
>>> > > > > > > > > hw_opt: nt 0 ntmpi 0 ntomp 1 ntomp_pme 1 gpu_id ''
>>> > > > > > > > > 0 CPUs detected, but 8 was returned by CPU_COUNTIn
>>> > > > > > gmx_setup_nodecomm:
>>> > > > > > > > > hostname 'myComputerName', hostnum 0
>>> > > > > > > > > ...
>>> > > > > > > > > 0 CPUs detected, but 8 was returned by CPU_COUNTOn rank
>>> 0,
>>> > > > thread 0,
>>> > > > > > core
>>> > > > > > > > > 0 the affinity setting returned 0
>>> > > > > > > > >
>>> > > > > > > > > I also made another try by compiling gromacs using some
>>> > > > experimental
>>> > > > > > > > > version of gcc 4.8, which did not help in this case.
>>> > > > > > > > > Is this a known problem? Obviously gromacs detects the
>>> right
>>> > > > value
>>> > > > > > with
>>> > > > > > > > > CPU_COUNT, why is it not just taking that value?
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > Best regards,
>>> > > > > > > > > Christian
>>> > > > > > > > > --
>>> > > > > > > > > gmx-users mailing list gmx-users at gromacs.org
>>> > > > > > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> > > > > > > > > * Please search the archive at
>>> > > > > > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
>>> > > > posting!
>>> > > > > > > > > * Please don't post (un)subscribe requests to the list.
>>> Use
>>> > the
>>> > > > > > > > > www interface or send it to
>>> gmx-users-request at gromacs.org.
>>> > > > > > > > > * Can't post? Read
>>> > http://www.gromacs.org/Support/Mailing_Lists
>>> > > > > > > > --
>>> > > > > > > > gmx-users mailing list gmx-users at gromacs.org
>>> > > > > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> > > > > > > > * Please search the archive at
>>> > > > > > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
>>> > > > posting!
>>> > > > > > > > * Please don't post (un)subscribe requests to the list.
>>> Use the
>>> > > > > > > > www interface or send it to gmx-users-request at gromacs.org.
>>> > > > > > > > * Can't post? Read
>>> > http://www.gromacs.org/Support/Mailing_Lists
>>> > > > > > > >
>>> > > > > > > --
>>> > > > > > > gmx-users mailing list gmx-users at gromacs.org
>>> > > > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> > > > > > > * Please search the archive at
>>> > > > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
>>> > posting!
>>> > > > > > > * Please don't post (un)subscribe requests to the list. Use
>>> the
>>> > > > > > > www interface or send it to gmx-users-request at gromacs.org.
>>> > > > > > > * Can't post? Read
>>> http://www.gromacs.org/Support/Mailing_Lists
>>> > > > > > --
>>> > > > > > gmx-users mailing list gmx-users at gromacs.org
>>> > > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> > > > > > * Please search the archive at
>>> > > > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
>>> > posting!
>>> > > > > > * Please don't post (un)subscribe requests to the list. Use the
>>> > > > > > www interface or send it to gmx-users-request at gromacs.org.
>>> > > > > > * Can't post? Read
>>> http://www.gromacs.org/Support/Mailing_Lists
>>> > > > > >
>>> > > > > --
>>> > > > > gmx-users mailing list gmx-users at gromacs.org
>>> > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> > > > > * Please search the archive at
>>> > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
>>> posting!
>>> > > > > * Please don't post (un)subscribe requests to the list. Use the
>>> > > > > www interface or send it to gmx-users-request at gromacs.org.
>>> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> > > > --
>>> > > > gmx-users mailing list gmx-users at gromacs.org
>>> > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> > > > * Please search the archive at
>>> > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
>>> posting!
>>> > > > * Please don't post (un)subscribe requests to the list. Use the
>>> > > > www interface or send it to gmx-users-request at gromacs.org.
>>> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> > > >
>>> > > --
>>> > > gmx-users mailing list gmx-users at gromacs.org
>>> > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> > > * Please search the archive at
>>> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>> > > * Please don't post (un)subscribe requests to the list. Use the
>>> > > www interface or send it to gmx-users-request at gromacs.org.
>>> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >                                           --
>>> > gmx-users mailing list    gmx-users at gromacs.org
>>> > http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> > * Please search the archive at
>>> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>> > * Please don't post (un)subscribe requests to the list. Use the
>>> > www interface or send it to gmx-users-request at gromacs.org.
>>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>> --
>>> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>>> 865-241-1537, ORNL PO BOX 2008 MS6309
>>> --
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>> * Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it to gmx-users-request at gromacs.org.
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>
>>
>



More information about the gromacs.org_gmx-users mailing list