[gmx-developers] getting gromacs to run on ORNL summitdev

Sedova, Ada A. sedovaaa at ornl.gov
Fri May 19 19:47:56 CEST 2017


Thanks, this is the last response I will send, as I have been told to keep these details off the development site.


I believe this problem is that I need to set the system settings to allow multiple ranks to access the same GPU. This was also a problem on Titan (a Cray), and required a Cray PROXY setting and several options for the aprun. I just talked to the NVIDIA folks and others at OLCF and there is in fact a process for turning on this functionality; it is off by default. This will need to be set for each user that wants to run gromacs on the system.


​It seemed liked developers would want to know about this, but I guess I was wrong. I am not extremely thrilled with discourteous way this was handled. Thanks, Mark, for being civil.




________________________________
From: gromacs.org_gmx-developers-bounces at maillist.sys.kth.se <gromacs.org_gmx-developers-bounces at maillist.sys.kth.se> on behalf of Mark Abraham <mark.j.abraham at gmail.com>
Sent: Friday, May 19, 2017 11:13 AM
To: gmx-developers at gromacs.org
Subject: Re: [gmx-developers] getting gromacs to run on ORNL summitdev

Hi,

OK we should probably document that better as we migrate our web presence. We moved away from major.minor because it meant people wondered about what signified each number changing, and didn't know they were running code that was ten years old...

On this platform, you should only consider using the gcc compiler - xlc does not work well in any of our hands.

Mark

On Fri, May 19, 2017 at 5:05 PM Sedova, Ada A. <sedovaaa at ornl.gov<mailto:sedovaaa at ornl.gov>> wrote:
Because of the naming, it seemed that 5.1.etc series was the latest long-term stable release, and the "2016 series" were not a long-term stable releases, as they were not prefixed by a number like 5.


________________________________________
From: gromacs.org_gmx-developers-bounces at maillist.sys.kth.se<mailto:gromacs.org_gmx-developers-bounces at maillist.sys.kth.se> <gromacs.org_gmx-developers-bounces at maillist.sys.kth.se<mailto:gromacs.org_gmx-developers-bounces at maillist.sys.kth.se>> on behalf of Szilárd Páll <pall.szilard at gmail.com<mailto:pall.szilard at gmail.com>>
Sent: Friday, May 19, 2017 10:58 AM
To: Discussion list for GROMACS development
Subject: Re: [gmx-developers] getting gromacs to run on ORNL summitdev

On Fri, May 19, 2017 at 4:54 PM, Sedova, Ada A. <sedovaaa at ornl.gov<mailto:sedovaaa at ornl.gov>> wrote:
> Will do. Wasn't sure if it was too beta. Will try today.

Not sure what you are referring to? Have you looked at the release
dates, the 2016 release series has been out for 9+ months.

http://manual.gromacs.org/documentation/

>
>
> Ada Sedova
> Postdoctoral Research Associate
> Scientific Computing Group, NCCS
> Oak Ridge National Laboratory, Oak Ridge, TN
>
> ________________________________________
> From: gromacs.org_gmx-developers-bounces at maillist.sys.kth.se<mailto:gromacs.org_gmx-developers-bounces at maillist.sys.kth.se> <gromacs.org_gmx-developers-bounces at maillist.sys.kth.se<mailto:gromacs.org_gmx-developers-bounces at maillist.sys.kth.se>> on behalf of Szilárd Páll <pall.szilard at gmail.com<mailto:pall.szilard at gmail.com>>
> Sent: Friday, May 19, 2017 10:53 AM
> To: Discussion list for GROMACS development
> Subject: Re: [gmx-developers] getting gromacs to run on ORNL summitdev
>
> Have you tried the *latest* release?
> --
> Szilárd
>
>
> On Fri, May 19, 2017 at 4:51 PM, Sedova, Ada A. <sedovaaa at ornl.gov<mailto:sedovaaa at ornl.gov>> wrote:
>> Szilárd,
>>
>> Ok, thanks. I was talking about the SIMD=VMX vs VSX directives, and yes I know it is ok not to specify, but we would like the maximum performance. But this is not the important issue, this is superfluous to the actual question asked; I was just trying to show you that the build seemed to be successful.
>>
>> *The main problem I am writing about is the CudaMalloc error I get when trying to run more than 4 mpi ranks. Please see the later sections of the original email.*
>>
>> Thanks so much,
>> Ada
>>
>>
>>
>> Ada Sedova
>> Postdoctoral Research Associate
>> Scientific Computing Group, NCCS
>> Oak Ridge National Laboratory, Oak Ridge, TN
>>
>> ________________________________________
>> From: gromacs.org_gmx-developers-bounces at maillist.sys.kth.se<mailto:gromacs.org_gmx-developers-bounces at maillist.sys.kth.se> <gromacs.org_gmx-developers-bounces at maillist.sys.kth.se<mailto:gromacs.org_gmx-developers-bounces at maillist.sys.kth.se>> on behalf of Szilárd Páll <pall.szilard at gmail.com<mailto:pall.szilard at gmail.com>>
>> Sent: Friday, May 19, 2017 10:33 AM
>> To: Discussion list for GROMACS development
>> Subject: Re: [gmx-developers] getting gromacs to run on ORNL summitdev
>>
>> Hi Ada,
>>
>> Firstly of all, please use the latest release, v2016 has been tested
>> quite well on IBM Minsky nodes.
>>
>> Not sure which altivec option are you referring to, but none should be
>> necessary, the build system should generate correct and sufficient
>> compiler options for decent performance (unless of course you want to
>> tune them further).
>>
>> NVML is only needed if you want reporting of the clocks or if you want
>> to allow mdrun to change these -- which won't work as AFAIK NVIDIA
>> does not allow user-space process to change clocks on the P100 (and
>> later?).
>> make check should also work, but it's easier of you build without
>> regressiontests enabled in CMake and run the regressiontests
>> separately. That way you can specify the MPI launcher, custom binary
>> suffix etc, e.g.
>> perl gmxtest.pl<http://gmxtest.pl> -mpirun mpirun -np N -ntomp M all
>>
>>
>> Cheers,
>> --
>> Szilárd
>>
>>
>> On Fri, May 19, 2017 at 3:45 PM, Sedova, Ada A. <sedovaaa at ornl.gov<mailto:sedovaaa at ornl.gov>> wrote:
>>>
>>> Hi folks,
>>>
>>>
>>> I am a member of the Scientific Computing group at Oak Ridge National Lab's
>>> National Center for Computational Sciences (NCCS). We are preparing high
>>> performance codes for public use on our new Summit system by testing on the
>>> prototype SummitDev:
>>>
>>>
>>> The Summitdev system is an early access system that is one generation
>>> removed from OLCF’s next big supercomputer, Summit. The system contains
>>> three racks, each with 18 IBM POWER8 S822LC nodes for a total of 54 nodes.
>>> Each IBM S822LC node has 2 IBM POWER8 CPUs and 4 NVIDIA Tesla P100 GPUs. The
>>> POWER8 CPUs have 32 8GB DDR4 memory (256 GB). Each POWER8 node has 10 cores
>>> with 8 HW threads. The GPUs are connected by NVLink 1.0 at 80GB/s and each
>>> GPU has 16GB HBM2 memory. The nodes are connected in a full fat-tree via EDR
>>> InfiniBand. The racks are liquid cooled with a heat exchanger rate.
>>> Summitdev has access to Spider 2, the OLCF’s center-wide Lustre parallel
>>> file system.
>>>
>>>
>>> I have built gromacs 5.1.4., seemingly successfully, using GPU, MPI, SIMD
>>> options with CMake directives (I didn't get the alvtivec option completely
>>> right, but that should be ok). The only things that were not found during
>>> the cmake were the NVML library (we do not have the Deployment package
>>> installed), LAPACK (although it did find essl after I set the BLAS
>>> directive), and some includes like io.h.
>>>
>>>
>>> I cannot use the make check because on summitdev I have to launch gmx_mpi
>>> with mpirun, and I think the make check does not call gmx that way. I have
>>> manually tested the build tools (pdb2gmx, editconf, grommp, solvate), and
>>> they work fine.
>>>
>>>
>>> However, I cannot get mdrun to work even on one node on a small box of
>>> water. After playing with the number of processes so that the openMP threads
>>> message stopped crying, I get the following:
>>>
>>>
>>>>>cudaMallocHost of size 1024128 bytes failed: all CUDA-capable devices are
>>>>> busy or unavailable
>>>
>>> Here is the complete context for this error:
>>>
>>>
>>> ****************************************************************************************************
>>>
>>> bash-4.2$ mpirun -np 80 /ccs/home/adaa/gromacs/bin/gmx_mpi mdrun -v -deffnm
>>> water_em
>>>
>>>                    :-) GROMACS - gmx mdrun, VERSION 5.1.4 (-:
>>>                             GROMACS is written by:
>>>      Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar
>>>  Aldert van Buuren   Rudi van Drunen     Anton Feenstra   Sebastian Fritsch
>>>   Gerrit Groenhof   Christoph Junghans   Anca Hamuraru    Vincent Hindriksen
>>>  Dimitrios Karkoulis    Peter Kasson        Jiri Kraus      Carsten Kutzner
>>>     Per Larsson      Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff
>>>    Erik Marklund      Teemu Murtola       Szilard Pall       Sander Pronk
>>>    Roland Schulz     Alexey Shvetsov     Michael Shirts     Alfons Sijbers
>>>    Peter Tieleman    Teemu Virolainen  Christian Wennberg    Maarten Wolf
>>>                            and the project leaders:
>>>         Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>>> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>>> Copyright (c) 2001-2015, The GROMACS development team at
>>> Uppsala University, Stockholm University and
>>> the Royal Institute of Technology, Sweden.
>>> check out http://www.gromacs.org for more information.
>>> GROMACS is free software; you can redistribute it and/or modify it
>>> under the terms of the GNU Lesser General Public License
>>> as published by the Free Software Foundation; either version 2.1
>>> of the License, or (at your option) any later version.
>>> GROMACS:      gmx mdrun, VERSION 5.1.4
>>> Executable:   /ccs/home/adaa/gromacs/bin/gmx_mpi
>>> Data prefix:  /ccs/home/adaa/gromacs
>>> Command line:
>>>   gmx_mpi mdrun -v -deffnm water_em
>>>
>>>
>>> Back Off! I just backed up water_em.log to ./#water_em.log.15#
>>> Number of logical cores detected (160) does not match the number reported by
>>> OpenMP (80).
>>> Consider setting the launch configuration manually!
>>> Running on 4 nodes with total 640 logical cores, 16 compatible GPUs
>>>   Logical cores per node:   160
>>>   Compatible GPUs per node:  4
>>>   All nodes have identical type(s) of GPUs
>>> Hardware detected on host summitdev-r0c1n04 (the node of MPI rank 0):
>>>   CPU info:
>>>     Vendor: IBM
>>>     Brand:  POWER8NVL (raw), altivec supported
>>>     SIMD instructions most likely to fit this hardware: IBM_VSX
>>>     SIMD instructions selected at GROMACS compile time: IBM_VMX
>>>   GPU info:
>>>     Number of GPUs detected: 4
>>>     #0: NVIDIA Tesla P100-SXM2-16GB, compute cap.: 6.0, ECC: yes, stat:
>>> compatible
>>>     #1: NVIDIA Tesla P100-SXM2-16GB, compute cap.: 6.0, ECC: yes, stat:
>>> compatible
>>>     #2: NVIDIA Tesla P100-SXM2-16GB, compute cap.: 6.0, ECC: yes, stat:
>>> compatible
>>>     #3: NVIDIA Tesla P100-SXM2-16GB, compute cap.: 6.0, ECC: yes, stat:
>>> compatible
>>> Compiled SIMD instructions: IBM_VMX, GROMACS could use IBM_VSX on this
>>> machine, which is better
>>> Reading file water_em.tpr, VERSION 5.1.4 (single precision)
>>> Using 80 MPI processes
>>> Using 8 OpenMP threads per MPI process
>>> On host summitdev-r0c1n04 4 compatible GPUs are present, with IDs 0,1,2,3
>>> On host summitdev-r0c1n04 4 GPUs auto-selected for this run.
>>> Mapping of GPU IDs to the 20 PP ranks in this node:
>>> 0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3
>>>
>>> NOTE: GROMACS was configured without NVML support hence it can not exploit
>>>       application clocks of the detected Tesla P100-SXM2-16GB GPU to improve
>>> performance.
>>>       Recompile with the NVML library (compatible with the driver used) or
>>> set application clocks manually.
>>>
>>> -------------------------------------------------------
>>> Program gmx mdrun, VERSION 5.1.4
>>> Source code file:
>>> /ccs/home/adaa/gromacs-5.1.4/src/gromacs/gmxlib/cuda_tools/pmalloc_cuda.cu<http://pmalloc_cuda.cu>,
>>> line: 70
>>> Fatal error:
>>> cudaMallocHost of size 1024128 bytes failed: all CUDA-capable devices are
>>> busy or unavailable
>>>
>>>
>>>
>>> ****************************************************************************************************
>>>
>>> Thanks for any help you can offer.
>>>
>>>
>>> Best,
>>>
>>>
>>> Ada
>>>
>>>
>>>
>>>
>>> Ada Sedova
>>>
>>> Postdoctoral Research Associate
>>>
>>> Scientific Computing Group, NCCS
>>>
>>> Oak Ridge National Laboratory, Oak Ridge, TN
>>>
>>>
>>> --
>>> Gromacs Developers mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
>>> send a mail to gmx-developers-request at gromacs.org<mailto:gmx-developers-request at gromacs.org>.
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org<mailto:gmx-developers-request at gromacs.org>.
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org<mailto:gmx-developers-request at gromacs.org>.
> --
> Gromacs Developers mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org<mailto:gmx-developers-request at gromacs.org>.
> --
> Gromacs Developers mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org<mailto:gmx-developers-request at gromacs.org>.
--
Gromacs Developers mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org<mailto:gmx-developers-request at gromacs.org>.
--
Gromacs Developers mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org<mailto:gmx-developers-request at gromacs.org>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20170519/8eb54de8/attachment-0001.html>


More information about the gromacs.org_gmx-developers mailing list