[gmx-users] MPI oversubscription

Roland Schulz roland at utk.edu
Wed Feb 6 22:28:07 CET 2013


On Wed, Feb 6, 2013 at 2:35 PM, Roland Schulz <roland at utk.edu> wrote:

>
> On Tue, Feb 5, 2013 at 8:52 AM, Christian H. <hypolit at googlemail.com>wrote:
>
>> Head of .log:
>>
>> Gromacs version:    VERSION 5.0-dev-20121213-e1fcb0a-dirty
>>
>
> Is it on purpose that you use version 5.0 and not 4.6? Unless you plan
> development I suggest to use 4.6 (git checkout release-4-6)
> I can reproduce your problem with 5.0. We haven't tested 5.0 much lately
> because we were so busy with 4.6.
>

If you want to use 5.0 you can take the version from here:
https://gerrit.gromacs.org/#/c/2132/. This fixes the problems.

Roland

 2013/2/5 Berk Hess <gmx3 at hotmail.com>
>>
>> >
>> > OK, then this is an unhandled case.
>> > Strange, because I am also running OpenSUSE 12.2 with the same CPU, but
>> > use gcc 4.7.1.
>> >
>> > I will file a bug report on redmine.
>> > Could you also post the header of md.log which gives all configuration
>> > information?
>> >
>> > To make it work for now, you can insert immediately after  #ifdef
>> > GMX_OMPENMP:
>> >     if (ret <= 0)
>> >     {
>> >         ret = gmx_omp_get_num_procs();
>> >     }
>> >
>> >
>> > Cheers,
>> >
>> > Berk
>> >
>> > ----------------------------------------
>> > > Date: Tue, 5 Feb 2013 14:27:44 +0100
>> > > Subject: Re: [gmx-users] MPI oversubscription
>> > > From: hypolit at googlemail.com
>> > > To: gmx-users at gromacs.org
>> > >
>> > > None of the variables referenced here are set on my system, the print
>> > > statements are never executed.
>> > >
>> > > What I did:
>> > >
>> > > printf("Checking which processor variable is set");
>> > > #if defined(_SC_NPROCESSORS_ONLN)
>> > > ret = sysconf(_SC_NPROCESSORS_ONLN);
>> > > printf("case 1 ret = %d\n",ret);
>> > > #elif defined(_SC_NPROC_ONLN)
>> > > ret = sysconf(_SC_NPROC_ONLN);
>> > > printf("case 2 ret = %d\n",ret);
>> > > #elif defined(_SC_NPROCESSORS_CONF)
>> > > ret = sysconf(_SC_NPROCESSORS_CONF);
>> > > printf("case 3 ret = %d\n",ret);
>> > > #elif defined(_SC_NPROC_CONF)
>> > > ret = sysconf(_SC_NPROC_CONF);
>> > > printf("case 4 ret = %d\n",ret);
>> > > #endif /* End of check for sysconf argument values */
>> > >
>> > > >From /etc/issue:
>> > > Welcome to openSUSE 12.2 "Mantis" - Kernel \r (\l)
>> > > >From uname -a:
>> > > Linux kafka 3.4.11-2.16-desktop #1 SMP PREEMPT Wed Sep 26 17:05:00 UTC
>> > 2012
>> > > (259fc87) x86_64 x86_64 x86_64 GNU/Linux
>> > >
>> > >
>> > >
>> > > 2013/2/5 Berk Hess <gmx3 at hotmail.com>
>> > >
>> > > >
>> > > > Hi,
>> > > >
>> > > > This is the same cpu I have in my workstation and this case should
>> not
>> > > > cause any problems.
>> > > >
>> > > > Which operating system and version are you using?
>> > > >
>> > > > If you know a bit about programming, could you check what goes
>> wrong in
>> > > > get_nthreads_hw_avail
>> > > > in src/gmxlib/gmx_detect_hardware.c ?
>> > > > Add after the four "ret =" at line 434, 436, 438 and 440:
>> > > > printf("case 1 ret = %d\n",ret);
>> > > > and replace 1 by different numbers.
>> > > > Thus you can check if one of the 4 cases returns 0 or none of the
>> cases
>> > > > is called.
>> > > >
>> > > > Cheers,
>> > > >
>> > > > Berk
>> > > >
>> > > >
>> > > > ----------------------------------------
>> > > > > Date: Tue, 5 Feb 2013 13:45:02 +0100
>> > > > > Subject: Re: [gmx-users] MPI oversubscription
>> > > > > From: hypolit at googlemail.com
>> > > > > To: gmx-users at gromacs.org
>> > > > >
>> > > > > >From the .log file:
>> > > > >
>> > > > > Present hardware specification:
>> > > > > Vendor: GenuineIntel
>> > > > > Brand: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
>> > > > > Family: 6 Model: 42 Stepping: 7
>> > > > > Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
>> > > > nonstop_tsc
>> > > > > pcid pclmuldq pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
>> > tdt
>> > > > > Acceleration most likely to fit this hardware: AVX_256
>> > > > > Acceleration selected at GROMACS compile time: AVX_256
>> > > > >
>> > > > > Table routines are used for coulomb: FALSE
>> > > > > Table routines are used for vdw: FALSE
>> > > > >
>> > > > >
>> > > > > >From /proc/cpuinfo (8 entries like this in total):
>> > > > >
>> > > > > processor : 0
>> > > > > vendor_id : GenuineIntel
>> > > > > cpu family : 6
>> > > > > model : 42
>> > > > > model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
>> > > > > stepping : 7
>> > > > > microcode : 0x28
>> > > > > cpu MHz : 1600.000
>> > > > > cache size : 8192 KB
>> > > > > physical id : 0
>> > > > > siblings : 8
>> > > > > core id : 0
>> > > > > cpu cores : 4
>> > > > > apicid : 0
>> > > > > initial apicid : 0
>> > > > > fpu : yes
>> > > > > fpu_exception : yes
>> > > > > cpuid level : 13
>> > > > > wp : yes
>> > > > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> > > > > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
>> > syscall nx
>> > > > > rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
>> xtopology
>> > > > > nonstop_tsc aperfmper
>> > > > > f pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr
>> > pdcm
>> > > > pcid
>> > > > > sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida
>> > arat
>> > > > epb
>> > > > > xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
>> > > > > bogomips : 6784.04
>> > > > > clflush size : 64
>> > > > > cache_alignment : 64
>> > > > > address sizes : 36 bits physical, 48 bits virtual
>> > > > > power management:
>> > > > >
>> > > > >
>> > > > > It also does not work on the local cluster, the output in the .log
>> > file
>> > > > is:
>> > > > >
>> > > > > Detecting CPU-specific acceleration.
>> > > > > Present hardware specification:
>> > > > > Vendor: AuthenticAMD
>> > > > > Brand: AMD Opteron(TM) Processor 6220
>> > > > > Family: 21 Model: 1 Stepping: 2
>> > > > > Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm
>> > misalignsse
>> > > > mmx
>> > > > > msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
>> > sse4.1
>> > > > > sse4.2 ssse3 xop
>> > > > > Acceleration most likely to fit this hardware: AVX_128_FMA
>> > > > > Acceleration selected at GROMACS compile time: AVX_128_FMA
>> > > > > Table routines are used for coulomb: FALSE
>> > > > > Table routines are used for vdw: FALSE
>> > > > >
>> > > > > I am not too sure about the details for that setup, but the brand
>> > looks
>> > > > > about right.
>> > > > > Do you need any other information?
>> > > > > Thanks for looking into it!
>> > > > >
>> > > > > 2013/2/5 Berk Hess <gmx3 at hotmail.com>
>> > > > >
>> > > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > This looks like our CPU detection code failed and the result is
>> not
>> > > > > > handled properly.
>> > > > > >
>> > > > > > What hardware are you running on?
>> > > > > > Could you mail the 10 lines from the md.log file following:
>> > "Detecting
>> > > > > > CPU-specific acceleration."?
>> > > > > >
>> > > > > > Cheers,
>> > > > > >
>> > > > > > Berk
>> > > > > >
>> > > > > >
>> > > > > > ----------------------------------------
>> > > > > > > Date: Tue, 5 Feb 2013 11:38:53 +0100
>> > > > > > > From: hypolit at googlemail.com
>> > > > > > > To: gmx-users at gromacs.org
>> > > > > > > Subject: [gmx-users] MPI oversubscription
>> > > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > I am using the latest git version of gromacs, compiled with
>> gcc
>> > > > 4.6.2 and
>> > > > > > > openmpi 1.6.3.
>> > > > > > > I start the program using the usual mpirun -np 8 mdrun_mpi ...
>> > > > > > > This always leads to a warning:
>> > > > > > >
>> > > > > > > Using 1 MPI process
>> > > > > > > WARNING: On node 0: oversubscribing the available 0 logical
>> CPU
>> > > > cores per
>> > > > > > > node with 1 MPI processes.
>> > > > > > >
>> > > > > > > Checking the processes confirms that there is only one of the
>> 8
>> > > > available
>> > > > > > > cores used.
>> > > > > > > Running mdrun_mpi with an additional debug -1:
>> > > > > > >
>> > > > > > > Detected 0 processors, will use this as the number of
>> supported
>> > > > hardware
>> > > > > > > threads.
>> > > > > > > hw_opt: nt 0 ntmpi 0 ntomp 1 ntomp_pme 1 gpu_id ''
>> > > > > > > 0 CPUs detected, but 8 was returned by CPU_COUNTIn
>> > > > gmx_setup_nodecomm:
>> > > > > > > hostname 'myComputerName', hostnum 0
>> > > > > > > ...
>> > > > > > > 0 CPUs detected, but 8 was returned by CPU_COUNTOn rank 0,
>> > thread 0,
>> > > > core
>> > > > > > > 0 the affinity setting returned 0
>> > > > > > >
>> > > > > > > I also made another try by compiling gromacs using some
>> > experimental
>> > > > > > > version of gcc 4.8, which did not help in this case.
>> > > > > > > Is this a known problem? Obviously gromacs detects the right
>> > value
>> > > > with
>> > > > > > > CPU_COUNT, why is it not just taking that value?
>> > > > > > >
>> > > > > > >
>> > > > > > > Best regards,
>> > > > > > > Christian
>> > > > > > > --
>> > > > > > > gmx-users mailing list gmx-users at gromacs.org
>> > > > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > > > > > > * Please search the archive at
>> > > > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
>> > posting!
>> > > > > > > * Please don't post (un)subscribe requests to the list. Use
>> the
>> > > > > > > www interface or send it to gmx-users-request at gromacs.org.
>> > > > > > > * Can't post? Read
>> http://www.gromacs.org/Support/Mailing_Lists
>> > > > > > --
>> > > > > > gmx-users mailing list gmx-users at gromacs.org
>> > > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > > > > > * Please search the archive at
>> > > > > > http://www.gromacs.org/Support/Mailing_Lists/Search before
>> > posting!
>> > > > > > * Please don't post (un)subscribe requests to the list. Use the
>> > > > > > www interface or send it to gmx-users-request at gromacs.org.
>> > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > > > > >
>> > > > > --
>> > > > > gmx-users mailing list gmx-users at gromacs.org
>> > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > > > > * Please search the archive at
>> > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> > > > > * Please don't post (un)subscribe requests to the list. Use the
>> > > > > www interface or send it to gmx-users-request at gromacs.org.
>> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > > > --
>> > > > gmx-users mailing list gmx-users at gromacs.org
>> > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > > > * Please search the archive at
>> > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> > > > * Please don't post (un)subscribe requests to the list. Use the
>> > > > www interface or send it to gmx-users-request at gromacs.org.
>> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > > >
>> > > --
>> > > gmx-users mailing list gmx-users at gromacs.org
>> > > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> > > * Please don't post (un)subscribe requests to the list. Use the
>> > > www interface or send it to gmx-users-request at gromacs.org.
>> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >                                           --
>> > gmx-users mailing list    gmx-users at gromacs.org
>> > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> > * Please don't post (un)subscribe requests to the list. Use the
>> > www interface or send it to gmx-users-request at gromacs.org.
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>>
>>
>>
>>
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
>



-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309



More information about the gromacs.org_gmx-users mailing list