[gmx-users] Can't run v. 4.6.6 on worker nodes if they are compiled with SIMD

Seyyed Mohtadin Hashemi haadah at gmail.com
Mon Jul 28 19:36:41 CEST 2014


On Sat, Jul 26, 2014 at 1:53 PM, Mark Abraham <mark.j.abraham at gmail.com>
wrote:

> On Sat, Jul 26, 2014 at 7:35 PM, Seyyed Mohtadin Hashemi <haadah at gmail.com
> >
> wrote:
>
> > On Jul 26, 2014 4:52 AM, "Mark Abraham" <mark.j.abraham at gmail.com>
> wrote:
> > >
> > > Hi,
> > >
> > > That is indeed very weird - particularly if compiling on the compute
> > nodes
> > > with GPU support enabled gives the same result. Both host and compute
> > nodes
> > > support rdtscp, so that known suspect is OK. I can only guess that
> > there's
> > > something in the CUDA installation process that targets the CPU on the
> > > install host. Configuring with -DCMAKE_BUILD_TYPE=Debug and getting a
> > stack
> > > trace from the crash might help work out where the problem arises.
> >
> > Will get back with the stack trace a bit later; is "gdb bt full" ok?
>
>
> Probably.
>
> Mark
>
>
> > Or do
> > you want "thread info" as well?
> >
> > Doing a
> > > CUDA install on a compute node and compiling against that might help.
> >
> > You mean install CUDA SDK on the worker nodes? If so, this is already
> done
> > and gives same result. I will ask the admin about the configuration of
> CUDA
> > on the worker nodes.
> > >
> > > Mark
> > >
> > >
> > >
> > > On Fri, Jul 25, 2014 at 10:00 PM, Seyyed Mohtadin Hashemi <
> > haadah at gmail.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'm having a very weird problem with GROMACS 4.6.6:
> > > >
> > > > I am currently testing out GPU capabilities and was trying to compile
> > > > GROMACS with CUDA (v6.0). I can not make this work if I compile
> GROMACS
> > > > with SIMD, no matter what kernel I choose - I have tried everything
> > from
> > > > SSE2 to AVX_256.
> > > >
> > > > The log-in node, where I compile, has AMD Interlagos CPUs (worker
> nodes
> > use
> > > > Xeon E5-2630 and are equipped with Tesla K20), but I do not think
> this
> > is
> > > > the problem - I have compiled GROMACS, using the log-in node, without
> > CUDA
> > > > but with AVX_256 SIMD and everything works. As soon as CUDA is added
> to
> > the
> > > > mix, I get "Illegal Instruction" every time I try to run on the
> worker
> > > > nodes.
> > > >
> > > > Compiling on worker nodes gives the same result. However, as soon as
> I
> > set
> > > > SIMD=None everything works and I am able to run simulation using
> GPUs,
> > this
> > > > is regardless of if I use log-in node or worker node to compile.
> > > >
> > > >
> > > > The cmake string used to configure is:
> > > > ccmake .. -DCMAKE_INSTALL_PREFIX=/work/gromacs4gpu -DGMX_DOUBLE=OFF
> > > > -DGMX_DEFAULT_SUFFIX=OFF -DGMX_BINARY_SUFFIX=_4gpu
> > -DGMX_LIBS_SUFFIX=_4gpu
> > > > -DGMX_GPU=ON -DBUILD_SHARED_LIBS=OFF -DGMX_PREFER_STATIC_LIBS=ON
> > > > -GMX_MPI=OFF -DGMX_CPU_ACCELERATION=AVX_256
> > > >
> > > > CUDA v6.0 and FFTW v3.3.4 (single precision) libs are set globally
> and
> > > > correctly identified by GROMACS. To remove OpenMPI as a problem I am
> > > > compiling without it (compiling with OpenMPI produced the same
> behavior
> > as
> > > > without), once I have found the error I will compile with OpenMPI
> > v1.6.5.
> > > >
> > > > I get these warnings during the configuration, nothing important:
> > > >
> > > >  A BLAS library was not found by CMake in the paths available to it.
> > > > Falling back on the GROMACS internal version of the BLAS library
> > instead.
> > > > This is fine for normal usage.
> > > >
> > > >  A LAPACK library was not found by CMake in the paths available to
> it.
> > > > Falling back on the GROMACS internal version of the LAPACK library
> > instead.
> > > > This is fine for normal usage.
> > > >
> > > > I am currently trying to compile and test GROMACS 5.0 to see if it
> also
> > > > exhibits the same behavior.
> > > >
> > > > I hope that someone can point me in the direction of a possible
> > solution,
> > > > if not then I will file a bug report.
> > > >
> > > > Regards,
> > > > Mohtadin
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > > send a mail to gmx-users-request at gromacs.org.
> > > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>

A possible, but very strange, solution was found over the weekend:

To find out which step causes problem I compiled single precision GROMACS
v5.0 (using the Debug profile) with no GPU support, no SIMD, and no MPI. As
was expected, everything worked.

I then compiled mdrun only with GPU support and Debug (but still no SIMD
and no MPI); again everything was working.

Next step was to compile a new mdrun with GPU and SIMD (still no MPI) - it
worked! I tried with SSE2, SSE4.1, and AVX_256 - all work. As the last step
I added MPI, and again everything works!

So I went back and made a new compilation with all options (i.e. AVX_256,
MPI, and GPU), still using Debug profile - and lo and behold, everything
works. However, if I configure/compile using the Release profile nothing
works. (To be sure that I did not have a corrupt package, I re-downloaded
the package. MD5 sum matched with sum on website.)

Hope this can narrow down what is wrong.

At least now, I have a working system that I can run some tests on.


More information about the gromacs.org_gmx-users mailing list