[gmx-users] gmx5.0.2 with GPU acceleration problem

Wayne Liang chungwen.liang at gmail.com
Fri Mar 27 16:50:13 CET 2015


Hello Szilárd,

Thanks for your response. The error msg I got:

from the slurm report:

Rank 31 [Tue Mar 10 15:50:53 2015] [c6-2c0s5n2] application called
MPI_Abort(MPI_COMM_WORLD, -1) - process 31

Rank 27 [Tue Mar 10 15:50:53 2015] [c6-2c0s4n2] application called
MPI_Abort(MPI_COMM_WORLD, -1) - process 27

Rank 23 [Tue Mar 10 15:50:53 2015] [c5-2c1s14n2] application called
MPI_Abort(MPI_COMM_WORLD, -1) - process 23

Rank 19 [Tue Mar 10 15:50:53 2015] [c9-1c1s14n2] application called
MPI_Abort(MPI_COMM_WORLD, -1) - process 19

Rank 15 [Tue Mar 10 15:50:53 2015] [c9-1c1s13n2] application called
MPI_Abort(MPI_COMM_WORLD, -1) - process 15

Rank 7 [Tue Mar 10 15:50:53 2015] [c7-1c2s7n3] application called
MPI_Abort(MPI_COMM_WORLD, -1) - process 7

_pmiu_daemon(SIGCHLD): [NID 04922] [c5-2c1s14n2] [Tue Mar 10 15:50:53 2015]
PE RANK 23 exit signal Aborted

_pmiu_daemon(SIGCHLD): [NID 03423] [c7-1c2s7n3] [Tue Mar 10 15:50:53 2015]
PE RANK 7 exit signal Aborted

_pmiu_daemon(SIGCHLD): [NID 01382] [c7-0c0s9n2] [Tue Mar 10 15:50:53 2015]
PE RANK 3 exit signal Aborted

_pmiu_daemon(SIGCHLD): [NID 03766] [c9-1c1s13n2] [Tue Mar 10 15:50:53 2015]
PE RANK 15 exit signal Aborted

_pmiu_daemon(SIGCHLD): [NID 05014] [c6-2c0s5n2] [Tue Mar 10 15:50:53 2015]
PE RANK 31 exit signal Aborted

_pmiu_daemon(SIGCHLD): [NID 05010] [c6-2c0s4n2] [Tue Mar 10 15:50:53 2015]
PE RANK 27 exit signal Aborted

_pmiu_daemon(SIGCHLD): [NID 03762] [c9-1c1s12n2] [Tue Mar 10 15:50:53 2015]
PE RANK 11 exit signal Aborted

[NID 04922] 2015-03-10 15:50:53 Apid 3843535: initiated application
termination

Application 3843535 exit codes: 134

Application 3843535 exit signals: Killed

Application 3843535 resources: utime ~123s, stime ~455s, Rss ~96416,
inblocks ~35678, outblocks ~60750


and there is nothing showing in md.log.


The option OMP_NUM_THREADS=8 (with -N = 1) was suggested by CSCS admin.
Please let me know what you think is the most efficient way to run. Thanks
very much for your suggestions.

Could you please share your submission script to me for the testing?

Best,

Chungwen



On Fri, Mar 27, 2015 at 3:09 PM, Szilárd Páll <pall.szilard at gmail.com>
wrote:

> Hi Chungwen,
>
> I run on Piz Daint on a regular basis and never had these issues. The
> only reason for such a thing to happen is that some strange rank over
> nodes distribution ends up leaving some nodes without PP rank.
>
> Could you please post a log file (thread pastebin or something
> similar) of the failing run?
>
> BTW, you know that you are almost always better off running
> OMP_NUM_THREADS<8 (and -N > 1)?
>
> --
> Szilárd
>
>
> On Fri, Mar 27, 2015 at 12:09 PM, Wayne Liang <chungwen.liang at gmail.com>
> wrote:
> > Dear Users,
> >
> > I have encountered a problem that mdrun always crashes with the following
> > msg, when I run it on more than 32 nodes:
> >
> > Software inconsistency error:
> >
> > Limiting the number of GPUs to <1 doesn't make sense (detected 1, 0
> > requested)!
> >
> > For more information and tips for troubleshooting, please check the
> GROMACS
> >
> > website at http://www.gromacs.org/Documentation/Errors
> >
> >
> > My submission script:
> >
> > #!/bin/bash -l
> >
> > #SBATCH --job-name test-gpu
> >
> > #SBATCH --nodes 32
> >
> > #SBATCH --cpus-per-task 8
> >
> > #SBATCH --ntasks-per-node 1
> >
> > #SBATCH --time 01:00:00
> >
> >
> > GMX=/apps/daint/gromacs/5.0.2/gnu_482/bin
> >
> >
> > cd $SLURM_SUBMIT_DIR/
> >
> > export OMP_NUM_THREADS=8
> >
> > aprun -n 32 -N 1 -d 8 $GMX/mdrun_mpi
> >
> > However, using option -nb cpu (bypass GPU acceleration), there is no any
> > problem.
> > I have searched online, including mailing list. There is not much
> > information about it.
> >
> > Thanks very much for any response.
> >
> > Best,
> >
> > Chungwen
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list