[gmx-users] WG: WG: Issue with CUDA and gromacs

Szilárd Páll pall.szilard at gmail.com
Fri Mar 15 17:58:48 CET 2019


Did you use a binary compiled generated from patched sources? If so can you
please also share the exact error message on the standard output?
--
Szilárd


On Fri, Mar 15, 2019 at 5:57 PM Szilárd Páll <pall.szilard at gmail.com> wrote:

> On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie <
> Stefanie.Tafelmeier at zae-bayern.de> wrote:
>
>> Hi,
>>
>> about the tests:
>> - ntmpi 1 -ntomp 22 -pin on; doesn't work*
>>
>
> OK, so this suggests that your previously successful 22-thread runs did
> not turn on pinning, I assume?
> Can you please try:
> -ntmpi 1 -ntomp 1 -pin on
> -ntmpi 1 -ntomp 2 -pin on
> that is to check does pinning work at all?
> Also, please try one/both of the above (assuming they fail with) same
> binary, but CPU-only run, i.e.
> -ntmpi 1 -ntomp 1 -pin on -nb cpu
>
>
>> - ntmpi 1 -ntomp 22 -pin off; runs
>> - ntmpi 1 -ntomp 23 -pin off; runs
>> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work*
>> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs
>> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work**
>> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work**
>>
>
> Just to confirm, can you please run the **'s with either -ntmpi 24 (to
> avoid the DD error).
>
>
>>
>> *Error as known.
>>
>> **The number of ranks you selected (23) contains a large prime factor 23.
>> In
>> most cases this will lead to bad performance. Choose a number with smaller
>> prime factors or set the decomposition (option -dd) manually.
>>
>> The log file is at:
>> https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8
>>
>
> Will have a look and get back with more later.
>
>
>>
>> Many thanks again,
>> Steffi
>>
>> -----Ursprüngliche Nachricht-----
>> Von: gromacs.org_gmx-users-bounces at maillist.sys.kth.se [mailto:
>> gromacs.org_gmx-users-bounces at maillist.sys.kth.se] Im Auftrag von
>> Szilárd Páll
>> Gesendet: Freitag, 15. März 2019 16:27
>> An: Discussion list for GROMACS users
>> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>>
>> Hi,
>>
>> Please share log files with an external service attachments are not
>> accepted on the list.
>>
>> Also, when checking the error with the patch supplied, please run the
>> following cases -- no long runs are needed just want to know which of
>> these
>> runs and which of these doesn't:
>> - ntmpi 1 -ntomp 22 -pin on
>> - ntmpi 1 -ntomp 22 -pin off
>> - ntmpi 1 -ntomp 23 -pin off
>> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on
>> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on
>> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on
>> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on
>>
>> Thanks,
>> --
>> Szilárd
>>
>>
>> On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie <
>> Stefanie.Tafelmeier at zae-bayern.de> wrote:
>>
>> > Hi Szilárd,
>> >
>> > thanks for the quick reply.
>> > About the first suggestion, I'll try and give feedback soon.
>> >
>> > Regarding the second, I attached the log-file for the case of
>> > mdrun -v -nt 25
>> > Which ends in the known error message.
>> >
>> > Again, thanks a lot for your information and help.
>> >
>> > Best wishes,
>> > Steffi
>> >
>> >
>> >
>> > -----Ursprüngliche Nachricht-----
>> > Von: gromacs.org_gmx-users-bounces at maillist.sys.kth.se [mailto:
>> > gromacs.org_gmx-users-bounces at maillist.sys.kth.se] Im Auftrag von
>> Szilárd
>> > Páll
>> > Gesendet: Freitag, 15. März 2019 15:30
>> > An: Discussion list for GROMACS users
>> > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>> >
>> > Hi Stefanie,
>> >
>> > Unless and until the error and performance-related concerns prove to be
>> > related, let's keep those separate.
>> >
>> > I'd first focus on the former. To be honest, I've never encountered
>> such an
>> > issue where if you use more than a certain number of threads, the run
>> > aborts with that error. To investigate further can you please apply the
>> > following patch file which hopefully give more context to the error:
>> > https://termbin.com/uhgp
>> > (e.g. you can execute the following to accomplish that:
>> > curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 <
>> > devicebuffer.cuh.patch)
>> >
>> > Regarding the performance-related questions, can you please share a full
>> > log file of the runs so we can see the machine config, simulation
>> > system/settings, etc. Without that it is hard to judge what's best for
>> your
>> > case. However, if you only have a single GPU (which seems to be the case
>> > based on the log excerpts) along those two rather beefy CPUs, than you
>> will
>> > likely not get much benefit from using all cores and it is normal that
>> you
>> > see little to no improvement from using cores of a second CPU socket.
>> >
>> > Cheers,
>> > --
>> > Szilárd
>> >
>> >
>> > On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie <
>> > Stefanie.Tafelmeier at zae-bayern.de> wrote:
>> >
>> > > Dear all,
>> > >
>> > > I was not sure if the email before reached you, but again many thanks
>> for
>> > > your reply Szilárd.
>> > >
>> > > As written below we are still facing a problem with the performance of
>> > > your workstation.
>> > > I wrote before because of the error message when keeping occurring for
>> > > mdrun simulation:
>> > >
>> > > Assertion failed:
>> > > Condition: stat == cudaSuccess
>> > > Asynchronous H2D copy failed
>> > >
>> > > As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are
>> the
>> > > newest once now.
>> > >
>> > > If I run mdrun without further settings it will lead to this error
>> > > message. If I run it and choose the thread amount directly the mdrun
>> is
>> > > performing well. But only for –nt numbers between 1 – 22. Higher ones
>> > again
>> > > lead to the before mentioned error message.
>> > >
>> > > In order to investigate in more detail, I tried different versions for
>> > > –nt, –ntmpi – ntomp also combined with –npme:
>> > > -       The best performance in the sense of ns/day is with –nt 22
>> > > respectively –ntomp 22 alone. But then only 22 threads are involved.
>> > Which
>> > > is fine if I run more than one mdrun simultaneously, as I can
>> distribute
>> > > the other 66 threads. The GPU usage is then around 65%.
>> > > -       A similar good performance is reached with mdrun  -ntmpi 4
>> -ntomp
>> > > 18 -npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU
>> > > usage is then around 50%.
>> > >
>> > > I read the information on
>> > >
>> >
>> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
>> > > which was very helpful, bur some things are still not clear now to me:
>> > > I was wondering if there is any other enhancement of the performance?
>> Or
>> > > what is the reason, that –nt maximum is at 22 threads? Could this be
>> > > connected to the sockets (see details below) of your workstation?
>> > > It is not clear to me how a number of thread (-nt) higher 22 can lead
>> to
>> > > the error regarding the Asynchronous H2D copy)
>> > >
>> > > Please excuse all these questions. I would appreciate a lot  if you
>> might
>> > > have a hint for this problem as well.
>> > >
>> > > Best regards,
>> > > Steffi
>> > >
>> > > -----
>> > >
>> > > The workstation details are:
>> > > Running on 1 node with total 44 cores, 88 logical cores, 1 compatible
>> GPU
>> > > Hardware detected:
>> > >
>> > >   CPU info:
>> > >     Vendor: Intel
>> > >     Brand:  Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz
>> > >     Family: 6   Model: 85   Stepping: 4
>> > >     Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl
>> clfsh
>> > > cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid
>> > pclmuldq
>> > > pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3
>> > tdt
>> > > x2apic
>> > >
>> > >     Number of AVX-512 FMA units: 2
>> > >   Hardware topology: Basic
>> > >     Sockets, cores, and logical processors:
>> > >       Socket  0: [   0  44] [   1  45] [   2  46] [   3  47] [   4
>> 48] [
>> > >  5  49] [   6  50] [   7  51] [   8  52] [   9  53] [  10  54] [  11
>> 55]
>> > > [  12  56] [  13  57] [  14  58] [  15  59] [  16  60] [  17  61] [
>> 18
>> > > 62] [  19  63] [  20  64] [  21  65]
>> > >       Socket  1: [  22  66] [  23  67] [  24  68] [  25  69] [  26
>> 70] [
>> > > 27  71] [  28  72] [  29  73] [  30  74] [  31  75] [  32  76] [  33
>> 77]
>> > > [  34  78] [  35  79] [  36  80] [  37  81] [  38  82] [  39  83] [
>> 40
>> > > 84] [  41  85] [  42  86] [  43  87]
>> > >   GPU info:
>> > >     Number of GPUs detected: 1
>> > >     #0: NVIDIA Quadro P6000, compute cap.: 6.1, ECC:  no, stat:
>> > compatible
>> > >
>> > > -----
>> > >
>> > >
>> > >
>> > > -----Ursprüngliche Nachricht-----
>> > > Von: gromacs.org_gmx-users-bounces at maillist.sys.kth.se [mailto:
>> > > gromacs.org_gmx-users-bounces at maillist.sys.kth.se] Im Auftrag von
>> > Szilárd
>> > > Páll
>> > > Gesendet: Donnerstag, 31. Januar 2019 17:15
>> > > An: Discussion list for GROMACS users
>> > > Betreff: Re: [gmx-users] WG: Issue with CUDA and gromacs
>> > >
>> > > On Thu, Jan 31, 2019 at 2:14 PM Szilárd Páll <pall.szilard at gmail.com>
>> > > wrote:
>> > > >
>> > > > On Wed, Jan 30, 2019 at 5:15 PM Tafelmeier, Stefanie
>> > > > <Stefanie.Tafelmeier at zae-bayern.de> wrote:
>> > > > >
>> > > > > Dear all,
>> > > > >
>> > > > > We are facing an issue with the CUDA toolkit.
>> > > > > We tried several combinations of gromacs versions and CUDA
>> Toolkits.
>> > > No Toolkit older than 9.2 was possible to try as there are no driver
>> for
>> > > nvidia available for a Quadro P6000.
>> > > > > Gromacs
>> > > >
>> > > > Install the latest 410.xx drivers and it will work; the NVIDIA
>> driver
>> > > > download website (https://www.nvidia.com/Download/index.aspx)
>> > > > recommends 410.93.
>> > > >
>> > > > Here's a system with CUDA 10-compatible driver running o a system
>> with
>> > > > a P6000: https://termbin.com/ofzo
>> > >
>> > > Sorry, I misread that as "CUDA >=9.2 was not possible".
>> > >
>> > > Note that the driver is backward compatible, so you can use a new
>> > > driver with older CUDA versions.
>> > >
>> > > Also note that the oldest driver NVIDIA claims to have P6000 support
>> > > is 390.59 which is, as far as I know, one gen older than the 396 that
>> > > the CUDA 9.2 toolkit came with. This is however, not something I'd
>> > > recommend pursuing, use a new driver from the official site with any
>> > > CUDA version that GROMACS supports and it should be fine.
>> > >
>> > > >
>> > > > > CUDA
>> > > > >
>> > > > > Error message
>> > > > >
>> > > > > 2019
>> > > > >
>> > > > > 10.0
>> > > > >
>> > > > > gmx mdrun:
>> > > > > Assertion failed:
>> > > > > Condition: stat == cudaSuccess
>> > > > > Asynchronous H2D copy failed
>> > > > >
>> > > > > 2019
>> > > > >
>> > > > > 9.2
>> > > > >
>> > > > > gmx mdrun:
>> > > > > Assertion failed:
>> > > > > Condition: stat == cudaSuccess
>> > > > > Asynchronous H2D copy failed
>> > > > >
>> > > > > 2018.5
>> > > > >
>> > > > > 9.2
>> > > > >
>> > > > > gmx mdrun: Fatal error:
>> > > > > HtoD cudaMemcpyAsync failed: invalid argument
>> > > >
>> > > > Can we get some more details on these, please? complete log files
>> > > > would be a good start.
>> > > >
>> > > > > 5.1.5
>> > > > >
>> > > > > 9.2
>> > > > >
>> > > > > Installation make: nvcc fatal   : Unsupported gpu architecture
>> > > 'compute_20'*
>> > > > >
>> > > > > 2016.2
>> > > > >
>> > > > > 9.2
>> > > > >
>> > > > > Installation make: nvcc fatal   : Unsupported gpu architecture
>> > > 'compute_20'*
>> > > > >
>> > > > >
>> > > > > *We also tried to set the target CUDA architectures as described
>> in
>> > > the installation guide (
>> > > manual.gromacs.org/documentation/2019/install-guide/index.html).
>> > > Unfortunately it didn't work.
>> > > >
>> > > > What does it mean that it didn't work? Can you share the command you
>> > > > used and what exactly did not work?
>> > > >
>> > > > For the P6000 which is a "compute capability 6.1" device (for anyone
>> > > > who needs to look it up, go here:
>> > > > https://developer.nvidia.com/cuda-gpus), you should set
>> > > > cmake ../ -DGMX_CUDA_TARGET_SM="61"
>> > > >
>> > > > --
>> > > > Szilárd
>> > > >
>> > > > > Performing simulations on CPU only always works, yet of cause are
>> > more
>> > > slowly than they could be with additionally using the GPU.
>> > > > > The issue #2761 (https://redmine.gromacs.org/issues/2762) seems
>> > > similar to our problem.
>> > > > > Even though this issue is still open, we wanted to ask if you can
>> > give
>> > > us any information about how to solve this problem?
>> > > > >
>> > > > > Many thanks in advance.
>> > > > > Best regards,
>> > > > > Stefanie Tafelmeier
>> > > > >
>> > > > >
>> > > > > Further details if necessary:
>> > > > > The workstation:
>> > > > > 2 x Xeon Gold 6152 @ 3,7Ghz (22 K, 44Th, AVX512)
>> > > > > Nvidia Quadro P6000 with 3840 Cuda-Cores
>> > > > >
>> > > > > The simulations system:
>> > > > > Long chain alkanes (previously used with gromacs 5.1.5 and CUDA
>> 7.5 -
>> > > worked perfectly)
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > ZAE Bayern
>> > > > > Stefanie Tafelmeier
>> > > > > Bereich Energiespeicherung/Division Energy Storage
>> > > > > Thermische Energiespeicher/Thermal Energy Storage
>> > > > > Walther-Meißner-Str. 6
>> > > > > 85748 Garching
>> > > > >
>> > > > > Tel.: +49 89 329442-75
>> > > > > Fax: +49 89 329442-12
>> > > > > Stefanie.tafelmeier at zae-bayern.de<mailto:
>> > > Stefanie.tafelmeier at zae-bayern.de>
>> > > > > http://www.zae-bayern.de<http://www.zae-bayern.de/>
>> > > > >
>> > > > >
>> > > > > ZAE Bayern - Bayerisches Zentrum für Angewandte Energieforschung
>> e.
>> > V.
>> > > > > Vorstand/Board:
>> > > > > Prof. Dr. Hartmut Spliethoff (Vorsitzender/Chairman),
>> > > > > Prof. Dr. Vladimir Dyakonov
>> > > > > Sitz/Registered Office: Würzburg
>> > > > > Registergericht/Register Court: Amtsgericht Würzburg
>> > > > > Registernummer/Register Number: VR 1386
>> > > > >
>> > > > > Sämtliche Willenserklärungen, z. B. Angebote, Aufträge, Anträge
>> und
>> > > Verträge, sind für das ZAE Bayern nur in schriftlicher und
>> ordnungsgemäß
>> > > unterschriebener Form rechtsverbindlich. Diese E-Mail ist
>> ausschließlich
>> > > zur Nutzung durch den/die vorgenannten Empfänger bestimmt. Jegliche
>> > > unbefugte Offenbarung, Nutzung oder Verbreitung, sei es insgesamt oder
>> > > teilweise, ist untersagt. Sollten Sie diese E-Mail irrtümlich erhalten
>> > > haben, benachrichtigen Sie bitte unverzüglich den Absender und löschen
>> > Sie
>> > > diese E-Mail.
>> > > > >
>> > > > > Any declarations of intent, such as quotations, orders,
>> applications
>> > > and contracts, are legally binding for ZAE Bayern only if expressed
>> in a
>> > > written and duly signed form. This e-mail is intended solely for use
>> by
>> > the
>> > > recipient(s) named above. Any unauthorised disclosure, use or
>> > > dissemination, whether in whole or in part, is prohibited. If you have
>> > > received this e-mail in error, please notify the sender immediately
>> and
>> > > delete this e-mail.
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Gromacs Users mailing list
>> > > > >
>> > > > > * Please search the archive at
>> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> > > posting!
>> > > > >
>> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > > > >
>> > > > > * For (un)subscribe requests visit
>> > > > >
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> > or
>> > > send a mail to gmx-users-request at gromacs.org.
>> > > --
>> > > Gromacs Users mailing list
>> > >
>> > > * Please search the archive at
>> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> > > posting!
>> > >
>> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > >
>> > > * For (un)subscribe requests visit
>> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > > send a mail to gmx-users-request at gromacs.org.
>> > > --
>> > > Gromacs Users mailing list
>> > >
>> > > * Please search the archive at
>> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> > > posting!
>> > >
>> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > >
>> > > * For (un)subscribe requests visit
>> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > > send a mail to gmx-users-request at gromacs.org.
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> > posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > send a mail to gmx-users-request at gromacs.org.
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> > posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>
>


More information about the gromacs.org_gmx-users mailing list