[gmx-users] cudaMallocHost filed: unknown error

Szilárd Páll pall.szilard at gmail.com
Mon Mar 26 15:29:39 CEST 2018


As a side-note, your mdrun invocation does not seem suitable for GPU
accelerated runs, you'd most likely be better of running fewer ranks.
--
Szilárd


On Fri, Mar 23, 2018 at 9:26 PM, Christopher Neale
<chris.neale at alum.utoronto.ca> wrote:
> Hello,
>
> I am running gromacs 5.1.2 on single nodes where the run is set to use 32 cores and 4 GPUs. The run command is:
>
> mpirun -np 32 gmx_mpi mdrun -deffnm MD -maxh $maxh -dd 4 4 2 -npme 0 -gpu_id 00000000111111112222222233333333 -ntomp 1 -notunepme
>
> Some of my runs die with this error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> Below is the relevant part of the .log file. Searching the internet didn't turn up any solutions. I'll contact sysadmins if you think this is likely some problem with the hardware or rogue jobs. In my testing, a collection of 24 jobs had 6 die with this same error message (including the "1024128 bytes" and "pmalloc_cuda.cu, line: 70"). All on different nodes, and all those node next took repeat jobs that run fine. When the error occured, it was always right at the start of the run.
>
>
> Thank you for your help,
> Chris.
>
>
>
> Command line:
>   gmx_mpi mdrun -deffnm MD -maxh 0.9 -dd 4 4 2 -npme 0 -gpu_id 00000000111111112222222233333333 -ntomp 1 -notunepme
>
>
> Number of logical cores detected (72) does not match the number reported by OpenMP (2).
> Consider setting the launch configuration manually!
>
> Running on 1 node with total 36 cores, 72 logical cores, 4 compatible GPUs
> Hardware detected on host ko026.localdomain (the node of MPI rank 0):
>   CPU info:
>     Vendor: GenuineIntel
>     Brand:  Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
>     SIMD instructions most likely to fit this hardware: AVX2_256
>     SIMD instructions selected at GROMACS compile time: AVX2_256
>   GPU info:
>     Number of GPUs detected: 4
>     #0: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: compatible
>     #1: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: compatible
>     #2: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: compatible
>     #3: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: compatible
>
> Reading file MD.tpr, VERSION 5.1.2 (single precision)
> Can not increase nstlist because verlet-buffer-tolerance is not set or used
> Using 32 MPI processes
> Using 1 OpenMP thread per MPI process
>
> On host ko026.localdomain 4 GPUs user-selected for this run.
> Mapping of GPU IDs to the 32 PP ranks in this node: 0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3
>
> NOTE: You assigned GPUs to multiple MPI processes.
>
> NOTE: Your choice of number of MPI ranks and amount of resources results in using 1 OpenMP threads per rank, which is most likely inefficient. The optimum is usually between 2 and 6 threads per rank.
>
>
> NOTE: GROMACS was configured without NVML support hence it can not exploit
>       application clocks of the detected Tesla P100-PCIE-16GB GPU to improve performance.
>       Recompile with the NVML library (compatible with the driver used) or set application clocks manually.
>
>
> -------------------------------------------------------
> Program gmx mdrun, VERSION 5.1.2
> Source code file: /net/scratch3/cneale/exe/KODIAK/GROMACS/source/gromacs-5.1.2/src/gromacs/gmxlib/cuda_tools/pmalloc_cuda.cu, line: 70
>
> Fatal error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
>
> Halting parallel program gmx mdrun on rank 31 out of 32
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list