[gmx-users] cudaMallocHost filed: unknown error

Mark Abraham mark.j.abraham at gmail.com
Sat Mar 24 13:11:08 CET 2018


Hi,

Looks like rogue behavior from the GPU driver's last workload, or something
like that. cudaMallocHost asks the driver to allocate memory on the CPU in
a special way, but for GROMACS that can never run into e.g. lack of
resources.

Mark

On Fri, Mar 23, 2018, 21:27 Christopher Neale <chris.neale at alum.utoronto.ca>
wrote:

> Hello,
>
> I am running gromacs 5.1.2 on single nodes where the run is set to use 32
> cores and 4 GPUs. The run command is:
>
> mpirun -np 32 gmx_mpi mdrun -deffnm MD -maxh $maxh -dd 4 4 2 -npme 0
> -gpu_id 00000000111111112222222233333333 -ntomp 1 -notunepme
>
> Some of my runs die with this error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> Below is the relevant part of the .log file. Searching the internet didn't
> turn up any solutions. I'll contact sysadmins if you think this is likely
> some problem with the hardware or rogue jobs. In my testing, a collection
> of 24 jobs had 6 die with this same error message (including the "1024128
> bytes" and "pmalloc_cuda.cu, line: 70"). All on different nodes, and all
> those node next took repeat jobs that run fine. When the error occured, it
> was always right at the start of the run.
>
>
> Thank you for your help,
> Chris.
>
>
>
> Command line:
>   gmx_mpi mdrun -deffnm MD -maxh 0.9 -dd 4 4 2 -npme 0 -gpu_id
> 00000000111111112222222233333333 -ntomp 1 -notunepme
>
>
> Number of logical cores detected (72) does not match the number reported
> by OpenMP (2).
> Consider setting the launch configuration manually!
>
> Running on 1 node with total 36 cores, 72 logical cores, 4 compatible GPUs
> Hardware detected on host ko026.localdomain (the node of MPI rank 0):
>   CPU info:
>     Vendor: GenuineIntel
>     Brand:  Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
>     SIMD instructions most likely to fit this hardware: AVX2_256
>     SIMD instructions selected at GROMACS compile time: AVX2_256
>   GPU info:
>     Number of GPUs detected: 4
>     #0: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat:
> compatible
>     #1: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat:
> compatible
>     #2: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat:
> compatible
>     #3: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat:
> compatible
>
> Reading file MD.tpr, VERSION 5.1.2 (single precision)
> Can not increase nstlist because verlet-buffer-tolerance is not set or used
> Using 32 MPI processes
> Using 1 OpenMP thread per MPI process
>
> On host ko026.localdomain 4 GPUs user-selected for this run.
> Mapping of GPU IDs to the 32 PP ranks in this node:
> 0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3
>
> NOTE: You assigned GPUs to multiple MPI processes.
>
> NOTE: Your choice of number of MPI ranks and amount of resources results
> in using 1 OpenMP threads per rank, which is most likely inefficient. The
> optimum is usually between 2 and 6 threads per rank.
>
>
> NOTE: GROMACS was configured without NVML support hence it can not exploit
>       application clocks of the detected Tesla P100-PCIE-16GB GPU to
> improve performance.
>       Recompile with the NVML library (compatible with the driver used) or
> set application clocks manually.
>
>
> -------------------------------------------------------
> Program gmx mdrun, VERSION 5.1.2
> Source code file:
> /net/scratch3/cneale/exe/KODIAK/GROMACS/source/gromacs-5.1.2/src/gromacs/gmxlib/cuda_tools/
> pmalloc_cuda.cu, line: 70
>
> Fatal error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
>
> Halting parallel program gmx mdrun on rank 31 out of 32
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list