[gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP

Jeff Hammond jhammond at alcf.anl.gov
Wed Apr 24 22:01:50 CEST 2013


If you do not have a resource manager setting things up properly for
you already, you need to use hostfile as described on e.g.
http://www.open-mpi.org/faq/?category=running.

It is often useful to have an MPI application print out the result of
MPI_GET_PROCESSOR_NAME from every MPI process at job start in order to
have unambiguous information about where MPI processes are executing.
On many platforms, MPI_GET_PROCESSOR_NAME is just a wrapper to
gethostname(), in which case you can see from the duplication of IP
addresses whether or not you have properly launched your MPI job on
multiple nodes or not.

Best,

Jeff

On Wed, Apr 24, 2013 at 2:53 PM, Mark Abraham <mark.j.abraham at gmail.com> wrote:
> I suspect -np 2 is not starting a process on each node like I suspect you
> think it should, because all the symptoms are consistent with that. Possibly
> the Host field in the .log file output is diagnostic here. Check how your
> your MPI configuration works.
>
> Mark
>
> On Apr 24, 2013 7:47 PM, "Jochen Hub" <jhub at gwdg.de> wrote:
>>
>> Hi,
>>
>> I have a problem related to the oversubscribing issue reported on Feb 5 in
>> the user list - yet it seems different.
>>
>> I use the latest git 4.62 with icc13 and MVAPICH2/1.8.
>>
>> I run on 2 nodes with, each with 2 Xeon Harpertowns (E5472).
>>
>> export OMP_NUM_THREADS=1
>> mpiexec -np 16 mdrun
>>
>> everyhting is fine - reasonable performance. With
>>
>> export OMP_NUM_THREADS=8
>> mpiexec -np 2 mdrun
>>
>> I get the warning:
>>
>> WARNING: Oversubscribing the available 8 logical CPU cores with 16
>> threads. This will cause considerable performance loss!
>>
>> And the simulation is indeed very slow.
>>
>> According to Berk's suggestion in the thread "MPI oversubscription" in the
>> user list I have added print statements into
>> src/gmxlib/gmx_detect_hardware.c to check for the sysconf(...) functions. I
>> receive 8 in each MPI process:
>>
>> ret at _SC_NPROCESSORS_ONLN = 8
>> ret at _SC_NPROCESSORS_ONLN = 8
>>
>> Here, some more information from the log file (the two nodes are
>> apparently detected) - so I am a bit lost.
>>
>> Can someone give a hint how to solve this?
>>
>> Many thanks,
>> Jochen
>>
>> Host: r104i1n8  pid: 12912  nodeid: 0  nnodes:  2
>> Gromacs version:    VERSION 4.6.2-dev
>> Precision:          single
>> Memory model:       64 bit
>> MPI library:        MPI
>> OpenMP support:     enabled
>> GPU support:        disabled
>> invsqrt routine:    gmx_software_invsqrt(x)
>> CPU acceleration:   SSE4.1
>> FFT library:        fftw-3.3.1-sse2
>> Large file support: enabled
>> RDTSCP usage:       disabled
>> Built on:           Wed Apr 24 17:07:56 CEST 2013
>> Built by:           nicjohub at r104i1n0 [CMAKE]
>> Build OS/arch:      Linux 2.6.16.60-0.97.1-smp x86_64
>> Build CPU vendor:   GenuineIntel
>> Build CPU brand:    Intel(R) Xeon(R) CPU           E5472  @ 3.00GHz
>> Build CPU family:   6   Model: 23   Stepping: 6
>> Build CPU features: apic clfsh cmov cx8 cx16 lahf_lm mmx msr pdcm pse sse2
>> sse3 sse4.1 ssse3
>> C compiler:         /sw/comm/mvapich2/1.8-intel/bin/mpicc Intel icc (ICC)
>> 13.0.1 20121010
>> C compiler flags:   -msse4.1   -std=gnu99 -Wall   -ip -funroll-all-loops
>> -O3 -DNDEBUG
>>
>> Detecting CPU-specific acceleration.
>> Present hardware specification:
>> Vendor: GenuineIntel
>> Brand:  Intel(R) Xeon(R) CPU           E5472  @ 3.00GHz
>> Family:  6  Model: 23  Stepping:  6
>> Features: apic clfsh cmov cx8 cx16 lahf_lm mmx msr pdcm pse sse2 sse3
>> sse4.1 ssse3
>> Acceleration most likely to fit this hardware: SSE4.1
>> Acceleration selected at GROMACS compile time: SSE4.1
>>
>>
>>
>>
>> --
>> ---------------------------------------------------
>> Dr. Jochen Hub
>> Computational Molecular Biophysics Group
>> Institute for Microbiology and Genetics
>> Georg-August-University of Göttingen
>> Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
>> Phone: +49-551-39-14189
>> http://cmb.bio.uni-goettingen.de/
>> ---------------------------------------------------
>> --
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the www
>> interface or send it to gmx-developers-request at gromacs.org.
>
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
ALCF docs: http://www.alcf.anl.gov/user-guides



More information about the gromacs.org_gmx-developers mailing list