[gmx-users] strange GPU load distribution

Tue May 8 00:22:13 CEST 2018

I think we have everything ready at this point: a separate binary (not
sourced yet), and these options. We've set  GMX_DISABLE_GPU_DETECTION=1 in
the user's .bashrc and will try the other option, if this one fails. Will
update here on the bogging down situation.

Thanks a lot.

Alex

On Mon, May 7, 2018 at 2:54 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:

> Hi,
>
> You have at least one option more elegant than using a separate binary for
> EM.
>
> Set GMX_DISABLE_GPU_DETECTION=1 environment variable which is the internal
> GROMACS override that forces the detection off for cases similar to
> yours. That should solve the detection latency. If for some reason it does
> not, you can always set CUDA_VISIBLE_DEVICES="" so jobs simply do not "see"
> any GPUs. This is a standard environment variable for the CUDA runtime.
>
> Let us know if that worked.
>
> Cheers.
>
> --
> Szilárd
>
> On Mon, May 7, 2018 at 9:38 AM, Alex <nedomacho at gmail.com> wrote:
>
> > Thanks Mark. No need to be sorry, a CPU-only build is a simple enough
> fix.
> > Inelegant, but if it works, it's all good. I'll report as soon as we have
> > tried.
> >
> > I myself run things in a way that you would find very familiar, but we
> > have a colleague developing forcefields and that involves tons of very
> > short CPU-only runs getting submitted in bursts. Hopefully, one day
> you'll
> > be able to accommodate this scenario. :)
> >
> > Alex
> >
> >
> >
> > On 5/7/2018 1:13 AM, Mark Abraham wrote:
> >
> >> Hi,
> >>
> >> I don't see any problems there, but I note that there are run-time
> >> settings
> >> for the driver/runtime to block until no other process is using the GPU,
> >> which may be a contributing factor here.
> >>
> >> As Justin noted, if your EM jobs would use a build of GROMACS that is
> not
> >> configured to have access to the GPUs, then there can be no problem. I
> >> recommend you do that if you want to continue sharing this node between
> >> GPU
> >> and non-GPU jobs. There has long been the principle that users must take
> >> active steps to keep GROMACS processes away from each other when sharing
> >> CPU resources, and this is a similar situation.
> >>
> >> In the abstract, it would be reasonable to organize mdrun so that we
> >> determine that we might want to use a GPU if we have one before we run
> the
> >> GPU detection, however that high-level code is in considerable flux in
> >> development branches, and we are highly unlikely to prioritise such a
> fix
> >> in a stable release branch to suit this use case. I didn't think that
> some
> >> of the reorganization since 2016 release would have this effect, but
> >> apparently it can. Sorry!
> >>
> >> Mark
> >>
> >> On Mon, May 7, 2018 at 6:33 AM Alex <nedomacho at gmail.com> wrote:
> >>
> >> Mark,
> >>>
> >>> I am forwarding the response I received from the colleague who prepared
> >>> the box for my GMX install -- this is from the latest installation of
> >>> 2018.1. See text below and please let me know what you think. We have
> no
> >>> problem rebuilding things, but would like to understand what is wrong
> >>> before we pause all the work.
> >>>
> >>> Thank you,
> >>>
> >>> Alex
> >>>
> >>> "OS Ubuntu 16.04LTS
> >>>
> >>> After checking gcc and kernel-headers were installed I ran the
> following
> >>>
> >>> #sudo lspci |grep -i nvidia
> >>>
> >>> #curl
> >>>
> >>> http://developer.download.nvidia.com/compute/cuda/repos/ubun
> >>> tu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
> >>>   > cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
> >>>
> >>> #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
> >>>
> >>> #sudo apt-key adv
> >>> --fetch-keyshttp://
> >>> developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/
> >>> x86_64/7fa2af80.pub
> >>>
> >>> #sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
> >>>
> >>> #sudo apt update
> >>>
> >>> #sudo apt upgrade
> >>>
> >>> #sudo /sbin/shutdown -r now
> >>>
> >>> After reboot
> >>>
> >>> #sudo apt-get install cuda
> >>>
> >>> #export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
> >>>
> >>> #nvidia-smi
> >>>
> >>> I also compiled the samples in the cuda tree using the Makefile there
> >>> and had no problems."
> >>>
> >>> --
> >>> Gromacs Users mailing list
> >>>
> >>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>> posting!
> >>>
> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>
> >>> * For (un)subscribe requests visit
> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >>> send a mail to gmx-users-request at gromacs.org.
> >>>
> >>>
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/Support
> > /Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>