[gmx-users] strange GPU load distribution

Alex nedomacho at gmail.com
Wed May 9 04:43:19 CEST 2018


Hi Szilárd,

It really does appear that GMX_DISABLE_GPU_DETECTION=1 in the user's .bashrc fixed it right up. We haven't tried his runs alongside GPU-accelerated jobs yet, but he reports that none of his PIDs ever appear in nvidia-smi anymore and overall his jobs start much faster.

This was an excellent suggestion, thank you.

Alex

On 5/7/2018 2:54 PM, Szilárd Páll wrote:
> Hi,
>
> You have at least one option more elegant than using a separate binary for
> EM.
>
> Set GMX_DISABLE_GPU_DETECTION=1 environment variable which is the internal
> GROMACS override that forces the detection off for cases similar to
> yours. That should solve the detection latency. If for some reason it does
> not, you can always set CUDA_VISIBLE_DEVICES="" so jobs simply do not "see"
> any GPUs. This is a standard environment variable for the CUDA runtime.
>
> Let us know if that worked.
>
> Cheers.
>
> --
> Szilárd
>
> On Mon, May 7, 2018 at 9:38 AM, Alex <nedomacho at gmail.com> wrote:
>
>> Thanks Mark. No need to be sorry, a CPU-only build is a simple enough fix.
>> Inelegant, but if it works, it's all good. I'll report as soon as we have
>> tried.
>>
>> I myself run things in a way that you would find very familiar, but we
>> have a colleague developing forcefields and that involves tons of very
>> short CPU-only runs getting submitted in bursts. Hopefully, one day you'll
>> be able to accommodate this scenario. :)
>>
>> Alex
>>
>>
>>
>> On 5/7/2018 1:13 AM, Mark Abraham wrote:
>>
>>> Hi,
>>>
>>> I don't see any problems there, but I note that there are run-time
>>> settings
>>> for the driver/runtime to block until no other process is using the GPU,
>>> which may be a contributing factor here.
>>>
>>> As Justin noted, if your EM jobs would use a build of GROMACS that is not
>>> configured to have access to the GPUs, then there can be no problem. I
>>> recommend you do that if you want to continue sharing this node between
>>> GPU
>>> and non-GPU jobs. There has long been the principle that users must take
>>> active steps to keep GROMACS processes away from each other when sharing
>>> CPU resources, and this is a similar situation.
>>>
>>> In the abstract, it would be reasonable to organize mdrun so that we
>>> determine that we might want to use a GPU if we have one before we run the
>>> GPU detection, however that high-level code is in considerable flux in
>>> development branches, and we are highly unlikely to prioritise such a fix
>>> in a stable release branch to suit this use case. I didn't think that some
>>> of the reorganization since 2016 release would have this effect, but
>>> apparently it can. Sorry!
>>>
>>> Mark
>>>
>>> On Mon, May 7, 2018 at 6:33 AM Alex <nedomacho at gmail.com> wrote:
>>>
>>> Mark,
>>>> I am forwarding the response I received from the colleague who prepared
>>>> the box for my GMX install -- this is from the latest installation of
>>>> 2018.1. See text below and please let me know what you think. We have no
>>>> problem rebuilding things, but would like to understand what is wrong
>>>> before we pause all the work.
>>>>
>>>> Thank you,
>>>>
>>>> Alex
>>>>
>>>> "OS Ubuntu 16.04LTS
>>>>
>>>> After checking gcc and kernel-headers were installed I ran the following
>>>>
>>>> #sudo lspci |grep -i nvidia
>>>>
>>>> #curl
>>>>
>>>> http://developer.download.nvidia.com/compute/cuda/repos/ubun
>>>> tu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
>>>>    > cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
>>>>
>>>> #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
>>>>
>>>> #sudo apt-key adv
>>>> --fetch-keyshttp://
>>>> developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/
>>>> x86_64/7fa2af80.pub
>>>>
>>>> #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
>>>>
>>>> #sudo apt update
>>>>
>>>> #sudo apt upgrade
>>>>
>>>> #sudo /sbin/shutdown -r now
>>>>
>>>> After reboot
>>>>
>>>> #sudo apt-get install cuda
>>>>
>>>> #export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
>>>>
>>>> #nvidia-smi
>>>>
>>>> I also compiled the samples in the cuda tree using the Makefile there
>>>> and had no problems."
>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-request at gromacs.org.
>>>>
>>>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support
>> /Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>



More information about the gromacs.org_gmx-users mailing list