[gmx-users] MPI oversubscription

Berk Hess gmx3 at hotmail.com
Tue Feb 5 14:58:07 CET 2013


One last thing:
Maybe a macro is not set, but we can actually query the number of processors.
Could you replace the conditional that gets triggered on my machine:
#if defined(_SC_NPROCESSORS_ONLN)
to
#if 1

So we can check if the actual sysconf call works or not?

My workaround won't work without OpenMP.
Did you disable that manually?

Also large file support is not turned on.
It seems like your build setup is somehow messed up and lot of features are not found.

Cheers,

Berk


----------------------------------------
> Date: Tue, 5 Feb 2013 14:52:17 +0100
> Subject: Re: [gmx-users] MPI oversubscription
> From: hypolit at googlemail.com
> To: gmx-users at gromacs.org
>
> Head of .log:
>
> Gromacs version: VERSION 5.0-dev-20121213-e1fcb0a-dirty
> GIT SHA1 hash: e1fcb0a3d2768a8bb28c2e4e8012123ce773e18c (dirty)
> Precision: single
> MPI library: MPI
> OpenMP support: disabled
> GPU support: disabled
> invsqrt routine: gmx_software_invsqrt(x)
> CPU acceleration: AVX_256
> FFT library: fftw-3.3.2-sse2
> Large file support: disabled
> RDTSCP usage: enabled
> Built on: Tue Feb 5 10:58:32 CET 2013
> Built by: christian at k [CMAKE]
> Build OS/arch: Linux 3.4.11-2.16-desktop x86_64
> Build CPU vendor: GenuineIntel
> Build CPU brand: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
> Build CPU family: 6 Model: 42 Stepping: 7
> Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
> nonstop_tsc pcid pclmuldq pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2
> ssse3 tdt
> C compiler: /home/christian/opt/bin/mpicc GNU gcc (GCC) 4.8.0
> 20120618 (experimental)
> C compiler flags: -mavx -Wextra -Wno-missing-field-initializers
> -Wno-sign-compare -Wall -Wno-unused -Wunused-value -Wno-unknown-pragmas
> -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -O3
> -DNDEBUG
> C++ compiler: /home/christian/opt/bin/mpiCC GNU g++ (GCC) 4.8.0
> 20120618 (experimental)
> C++ compiler flags: -mavx -std=c++0x -Wextra
> -Wno-missing-field-initializers -Wnon-virtual-dtor -Wall -Wno-unused
> -Wunused-value -Wno-unknown-pragmas -fomit-frame-pointer
> -funroll-all-loops -fexcess-precision=fast -O3 -DNDEBUG
>
> I will try your workaround, thanks!
>
> 2013/2/5 Berk Hess <gmx3 at hotmail.com>
>
> >
> > OK, then this is an unhandled case.
> > Strange, because I am also running OpenSUSE 12.2 with the same CPU, but
> > use gcc 4.7.1.
> >
> > I will file a bug report on redmine.
> > Could you also post the header of md.log which gives all configuration
> > information?
> >
> > To make it work for now, you can insert immediately after #ifdef
> > GMX_OMPENMP:
> > if (ret <= 0)
> > {
> > ret = gmx_omp_get_num_procs();
> > }
> >
> >
> > Cheers,
> >
> > Berk
> >
> > ----------------------------------------
> > > Date: Tue, 5 Feb 2013 14:27:44 +0100
> > > Subject: Re: [gmx-users] MPI oversubscription
> > > From: hypolit at googlemail.com
> > > To: gmx-users at gromacs.org
> > >
> > > None of the variables referenced here are set on my system, the print
> > > statements are never executed.
> > >
> > > What I did:
> > >
> > > printf("Checking which processor variable is set");
> > > #if defined(_SC_NPROCESSORS_ONLN)
> > > ret = sysconf(_SC_NPROCESSORS_ONLN);
> > > printf("case 1 ret = %d\n",ret);
> > > #elif defined(_SC_NPROC_ONLN)
> > > ret = sysconf(_SC_NPROC_ONLN);
> > > printf("case 2 ret = %d\n",ret);
> > > #elif defined(_SC_NPROCESSORS_CONF)
> > > ret = sysconf(_SC_NPROCESSORS_CONF);
> > > printf("case 3 ret = %d\n",ret);
> > > #elif defined(_SC_NPROC_CONF)
> > > ret = sysconf(_SC_NPROC_CONF);
> > > printf("case 4 ret = %d\n",ret);
> > > #endif /* End of check for sysconf argument values */
> > >
> > > >From /etc/issue:
> > > Welcome to openSUSE 12.2 "Mantis" - Kernel \r (\l)
> > > >From uname -a:
> > > Linux kafka 3.4.11-2.16-desktop #1 SMP PREEMPT Wed Sep 26 17:05:00 UTC
> > 2012
> > > (259fc87) x86_64 x86_64 x86_64 GNU/Linux
> > >
> > >
> > >
> > > 2013/2/5 Berk Hess <gmx3 at hotmail.com>
> > >
> > > >
> > > > Hi,
> > > >
> > > > This is the same cpu I have in my workstation and this case should not
> > > > cause any problems.
> > > >
> > > > Which operating system and version are you using?
> > > >
> > > > If you know a bit about programming, could you check what goes wrong in
> > > > get_nthreads_hw_avail
> > > > in src/gmxlib/gmx_detect_hardware.c ?
> > > > Add after the four "ret =" at line 434, 436, 438 and 440:
> > > > printf("case 1 ret = %d\n",ret);
> > > > and replace 1 by different numbers.
> > > > Thus you can check if one of the 4 cases returns 0 or none of the cases
> > > > is called.
> > > >
> > > > Cheers,
> > > >
> > > > Berk
> > > >
> > > >
> > > > ----------------------------------------
> > > > > Date: Tue, 5 Feb 2013 13:45:02 +0100
> > > > > Subject: Re: [gmx-users] MPI oversubscription
> > > > > From: hypolit at googlemail.com
> > > > > To: gmx-users at gromacs.org
> > > > >
> > > > > >From the .log file:
> > > > >
> > > > > Present hardware specification:
> > > > > Vendor: GenuineIntel
> > > > > Brand: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
> > > > > Family: 6 Model: 42 Stepping: 7
> > > > > Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
> > > > nonstop_tsc
> > > > > pcid pclmuldq pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
> > tdt
> > > > > Acceleration most likely to fit this hardware: AVX_256
> > > > > Acceleration selected at GROMACS compile time: AVX_256
> > > > >
> > > > > Table routines are used for coulomb: FALSE
> > > > > Table routines are used for vdw: FALSE
> > > > >
> > > > >
> > > > > >From /proc/cpuinfo (8 entries like this in total):
> > > > >
> > > > > processor : 0
> > > > > vendor_id : GenuineIntel
> > > > > cpu family : 6
> > > > > model : 42
> > > > > model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
> > > > > stepping : 7
> > > > > microcode : 0x28
> > > > > cpu MHz : 1600.000
> > > > > cache size : 8192 KB
> > > > > physical id : 0
> > > > > siblings : 8
> > > > > core id : 0
> > > > > cpu cores : 4
> > > > > apicid : 0
> > > > > initial apicid : 0
> > > > > fpu : yes
> > > > > fpu_exception : yes
> > > > > cpuid level : 13
> > > > > wp : yes
> > > > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> > > > > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> > syscall nx
> > > > > rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> > > > > nonstop_tsc aperfmper
> > > > > f pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr
> > pdcm
> > > > pcid
> > > > > sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida
> > arat
> > > > epb
> > > > > xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
> > > > > bogomips : 6784.04
> > > > > clflush size : 64
> > > > > cache_alignment : 64
> > > > > address sizes : 36 bits physical, 48 bits virtual
> > > > > power management:
> > > > >
> > > > >
> > > > > It also does not work on the local cluster, the output in the .log
> > file
> > > > is:
> > > > >
> > > > > Detecting CPU-specific acceleration.
> > > > > Present hardware specification:
> > > > > Vendor: AuthenticAMD
> > > > > Brand: AMD Opteron(TM) Processor 6220
> > > > > Family: 21 Model: 1 Stepping: 2
> > > > > Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm
> > misalignsse
> > > > mmx
> > > > > msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
> > sse4.1
> > > > > sse4.2 ssse3 xop
> > > > > Acceleration most likely to fit this hardware: AVX_128_FMA
> > > > > Acceleration selected at GROMACS compile time: AVX_128_FMA
> > > > > Table routines are used for coulomb: FALSE
> > > > > Table routines are used for vdw: FALSE
> > > > >
> > > > > I am not too sure about the details for that setup, but the brand
> > looks
> > > > > about right.
> > > > > Do you need any other information?
> > > > > Thanks for looking into it!
> > > > >
> > > > > 2013/2/5 Berk Hess <gmx3 at hotmail.com>
> > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This looks like our CPU detection code failed and the result is not
> > > > > > handled properly.
> > > > > >
> > > > > > What hardware are you running on?
> > > > > > Could you mail the 10 lines from the md.log file following:
> > "Detecting
> > > > > > CPU-specific acceleration."?
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Berk
> > > > > >
> > > > > >
> > > > > > ----------------------------------------
> > > > > > > Date: Tue, 5 Feb 2013 11:38:53 +0100
> > > > > > > From: hypolit at googlemail.com
> > > > > > > To: gmx-users at gromacs.org
> > > > > > > Subject: [gmx-users] MPI oversubscription
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am using the latest git version of gromacs, compiled with gcc
> > > > 4.6.2 and
> > > > > > > openmpi 1.6.3.
> > > > > > > I start the program using the usual mpirun -np 8 mdrun_mpi ...
> > > > > > > This always leads to a warning:
> > > > > > >
> > > > > > > Using 1 MPI process
> > > > > > > WARNING: On node 0: oversubscribing the available 0 logical CPU
> > > > cores per
> > > > > > > node with 1 MPI processes.
> > > > > > >
> > > > > > > Checking the processes confirms that there is only one of the 8
> > > > available
> > > > > > > cores used.
> > > > > > > Running mdrun_mpi with an additional debug -1:
> > > > > > >
> > > > > > > Detected 0 processors, will use this as the number of supported
> > > > hardware
> > > > > > > threads.
> > > > > > > hw_opt: nt 0 ntmpi 0 ntomp 1 ntomp_pme 1 gpu_id ''
> > > > > > > 0 CPUs detected, but 8 was returned by CPU_COUNTIn
> > > > gmx_setup_nodecomm:
> > > > > > > hostname 'myComputerName', hostnum 0
> > > > > > > ...
> > > > > > > 0 CPUs detected, but 8 was returned by CPU_COUNTOn rank 0,
> > thread 0,
> > > > core
> > > > > > > 0 the affinity setting returned 0
> > > > > > >
> > > > > > > I also made another try by compiling gromacs using some
> > experimental
> > > > > > > version of gcc 4.8, which did not help in this case.
> > > > > > > Is this a known problem? Obviously gromacs detects the right
> > value
> > > > with
> > > > > > > CPU_COUNT, why is it not just taking that value?
> > > > > > >
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Christian
> > > > > > > --
> > > > > > > gmx-users mailing list gmx-users at gromacs.org
> > > > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > > > > > * Please search the archive at
> > > > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
> > posting!
> > > > > > > * Please don't post (un)subscribe requests to the list. Use the
> > > > > > > www interface or send it to gmx-users-request at gromacs.org.
> > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > > --
> > > > > > gmx-users mailing list gmx-users at gromacs.org
> > > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > > > > * Please search the archive at
> > > > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
> > posting!
> > > > > > * Please don't post (un)subscribe requests to the list. Use the
> > > > > > www interface or send it to gmx-users-request at gromacs.org.
> > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > >
> > > > > --
> > > > > gmx-users mailing list gmx-users at gromacs.org
> > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > > > * Please don't post (un)subscribe requests to the list. Use the
> > > > > www interface or send it to gmx-users-request at gromacs.org.
> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > --
> > > > gmx-users mailing list gmx-users at gromacs.org
> > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > > * Please don't post (un)subscribe requests to the list. Use the
> > > > www interface or send it to gmx-users-request at gromacs.org.
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > --
> > > gmx-users mailing list gmx-users at gromacs.org
> > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > * Please don't post (un)subscribe requests to the list. Use the
> > > www interface or send it to gmx-users-request at gromacs.org.
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > --
> > gmx-users mailing list gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 		 	   		  


More information about the gromacs.org_gmx-users mailing list