[gmx-users] Two strange problems when using GPU on multiple nodes

Tue Jun 30 23:57:19 CEST 2015

Hi,

Not from the GROMACS end. But talk your sysadmins about what might have
changed in their domain.

Mark

On Tue, Jun 30, 2015 at 11:34 PM Mark Zang <zangtw at gmail.com> wrote:

> Thanks for the quick response! The second problem is clear to me now but
> still not quite clear about the first one..
>
> I have run pure MPI simulations of the same system on the same machine
> before. In that simulation, the trajectory file with 11G size generates in
> 48h. However, in my current simulation where GPUs are used, only 5~6G data
> is generated in 15h (1.5x simulation speed with half the number of nodes
> using GPU :D  ) and the simulation stops. It does not make sense to me and
> I guess it shouldn’t be caused by the file size limit.
>
> Is there any method to figure out what happened to my filesystem after 15h
> simulation?
>
> Thanks again
>
>
> —
> Sent from Mailbox
>
> On Tue, Jun 30, 2015 at 3:50 PM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
> > Hi,
> > On Tue, Jun 30, 2015 at 10:38 PM Mark Zang <zangtw at gmail.com> wrote:
> >> Dear all,
> >> The cluster I am currently using has 24 CPU cores and 4 GPU cores per
> node
> >> and I am attempting to use 2 nodes for my simulation. While my sbatch
> >> script can be executed well, I have found two strange problems and I am
> a
> >> little confused about it. Here is my sbatch script:
> >>
> >>
> >> #!/bin/bash
> >> #SBATCH --job-name="zangtw"
> >> #SBATCH --output="zangtw.%j.%N.out"
> >> #SBATCH --partition=gpu
> >> #SBATCH --nodes=2
> >> #SBATCH --ntasks-per-node=4
> >> #SBATCH --gres=gpu:4
> >> #SBATCH --export=ALL
> >> #SBATCH -t 47:30:00
> >>
> >> ibrun --npernode 4 mdrun -ntomp 6 -s run.tpr -pin on -gpu_id 0123
> >>
> >>
> >> and here are the descriptions of my problems:
> >> 1. My simulation stops every 10+ hours. Specifically, the job is still
> >> “running” in the queue but md.log/traj.trr stop updating.
> > That suggests the filesystem has gone AWOL, or filled, or got to a 2GB
> file
> > size limit, or such.
> >> Is it due to the lack of memory?
> > No
> >> I have never met this before when I ran pure MPI, pure OpenMP, or even
> >> hybrid MPI/OpenMP(without GPU) jobs.
> >>
> > Your throughput is different now, but those probably ran on other
> > infrastructure, right? That's a more relevant difference.
> >> 2. I am attempting to use 8 GPUs, but only 4 GPUs are displayed in my
> log
> >> file (as follow):
> >>
> >>
> >> Using 8 MPI processes
> >> Using 6 OpenMP threads per MPI process
> >> 4 GPUs detected on host comet-30-06.sdsc.edu:
> >> #0: NVIDIA Tesla K80, compute cap.: 3.7, ECC:  no, stat: compatible
> >> #1: NVIDIA Tesla K80, compute cap.: 3.7, ECC:  no, stat: compatible
> >> #2: NVIDIA Tesla K80, compute cap.: 3.7, ECC:  no, stat: compatible
> >> #3: NVIDIA Tesla K80, compute cap.: 3.7, ECC:  no, stat: compatible
> >>
> >>
> >> Does it mean that only 4 GPUs in the first node are used while 4 GPUs in
> >> the second node are idle,
> > I suspect you can't even force that to happen if you wanted to ;-)
> >> or 8 GPUs are used in reality but only 4 of them are displayed in the
> log
> >> file?
> > This - the reporting names a specific node, and you are using two.
> GROMACS
> > 5.1 will be more helpful with such reporting.
> >> In the second case, what should I do if I want to grep the information
> of
> >> the other 4 GPUs?
> >>
> > Upgrade to 5.1 shortly ;-)
> > Mark
> >>
> >>
> >> Thank you guys so much!
> >>
> >>
> >> Regards,
> >> Mark
> >>
> >>
> >>
> >>
> >>
> >> —
> >> Sent from Mailbox
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.