[gmx-users] Assistance needed running gromacs 4.6.3 on Blue Gene/P

arrow50311 linxingcheng50311 at gmail.com
Wed Mar 12 22:38:06 CET 2014


Is there any follow-up for this question?

I met with exactly the same problem on Bluegene/P. 

Could anyone offer a help?

Thank you,


Prentice Bisbal wrote
> Mark,
> 
> Since I was working with 4.6.2, I built 4.6.3 to see if this was the 
> result of a bug in 4.6.2. It isn't I get the same error with 4.6.3, but 
> that is the version I'll be working with from now on, since it's the 
> latest. Since the problem occurs with both versions, might as well try 
> to fix it in the latest version, right?
> 
> I compiled 4.6.3 with the following options to include debugging 
> information:
> 
> cmake .. \
> -DCMAKE_TOOLCHAIN_FILE=../cmake/Platform/BlueGeneP-static-XL-C.cmake \
>    -DBUILD_SHARED_LIBS=OFF \
>    -DGMX_MPI=ON \
>    -DCMAKE_C_FLAGS="-O0 -g -qstrict -qarch=450 -qtune=450" \
>    -DCMAKE_INSTALL_PREFIX=/scratch/bgapps/gromacs-4.6.3 \
>    -DGMX_CPU_ACCELERATION=None \
>    -DGMX_THREAD_MPI=OFF \
>    -DGMX_OPENMP=OFF \
>    -DGMX_DEFAULT_SUFFIX=ON \
>    -DCMAKE_PREFIX_PATH=/scratch/bgapps/fftw-3.3.2 \
>     2>&1 | tee cmake.log
> 
> For qarch, I removed the 'd' from the end, so that the double-FPU isn't 
> used, which can cause problems if the data isn't aligned correctly. The 
> -qstrict makes sure certain optimizations aren't performed. It should be 
> superfluous with optimization levels below 3, but I through it in just 
> to be safe, and set -O0. (of course, I think -g turns off all 
> optizations, anyway)
> 
> On the BG/P, I had to install FFTW3 separately, and that wasn't 
> installed with debugging active, so there are no symbols for FFTW.
> 
> One of my coworkers wrote a script that converts BG/P core files to 
> stack traces. In all the kernels I've looked at so far (9 out of 64), 
> the stack ends at a vfprintf call. For example:
> 
> -------------------------------------------------------------
> 
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/resolv/res_init.c:414
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/libio/wgenops.c:419
> /scratch/pbisbal/build/gromacs-4.6.3/src/gmxlib/nonbonded/nb_kernel_c/nb_kernel_ElecRFCut_VdwBhamSh_GeomW4P1_c.c:673
> ??:0
> /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/sys/dcmf/../ccmi/executor/Broadcast.h:83
> /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/lib/dev/mpich2/src/mpid/dcmfd/src/coll/reduce/reduce_algorithms.c:69
> /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/lib/dev/mpich2/src/mpid/dcmfd/src/coll/bcast/bcast_algorithms.c:227
> /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/nbnxn_atomdata.c:779
> /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/nbnxn_atomdata.c:762
> /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/nbnxn_atomdata.c:374
> /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/calcmu.c:88
> /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/mdrun.c:113
> /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/runner.c:1492
> /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/genalg.c:467
> /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/calc_verletbuf.c:266
> ../stdio-common/printf_fphex.c:335
> ../stdio-common/printf_fphex.c:452
> ??:0
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> 
> -----------------------------------------------------------------
> 
> Another node with a different stack looks like this:
> 
> ---------------------------------------------------------------
> 
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/libio/genops.c:982
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/string/memcpy.c:159
> /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/ns.c:423
> /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/runner.c:1646
> /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/genalg.c:467
> /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/calc_verletbuf.c:266
> ../stdio-common/printf_fphex.c:335
> ../stdio-common/printf_fphex.c:452
> ??:0
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819
> 
> ---------------------------------------------------------------
> 
> All the stacks look like one of these two.
> 
> Is any of this information useful? My coworker, who has a lot of 
> experience developing for Blue Gene/P's, says this looks like an I/O 
> problem, but he doesn't have the time to dig into the Gromacs source 
> code for us. I'm willing to do some digging, but some guidance from 
> someone who know the code well would be very helpful.
> 
> Prentice
> 
> 
> 
> On 08/06/2013 08:19 PM, Mark Abraham wrote:
>> That all looks fine so far. The core file processor won't help unless
>> you've compiled with -g. Hopefully cmake -DCMAKE_BUILD_TYPE=Debug will
>> do that, but I haven't actually checked that really works. If not, you
>> might have to hack cmake/Platform/BlueGeneP-static-XL-C.cmake.
>>
>> Anyway, if you can compile with -g, then the core file will tell us in
>> what function it is dying, which might help locate the problem.
>>
>> Mark
>>
>> On Tue, Aug 6, 2013 at 11:43 PM, Prentice Bisbal
>> <

> prentice.bisbal@

> > wrote:
>>> Dear GMX-users,
>>>
>>> I need some assistance running Gromacs 4.6.3 on a Blue Gene/P. Although
>>> I
>>> have  a background in Chemistry, I'm an experienced professional HPC
>>> admin
>>> who's relatively new to supporting Blue Genes and Gromacs. My first
>>> Gromacs
>>> user is having trouble running Gromacs on our BG/P. His jobs die and
>>> dump
>>> core, with no obvious signs (not to me, at least) of where the problem
>>> lies.
>>>
>>> I compiled Gromacs 4.6.3 with the following options:
>>>
>>> ------------------------------------------snip-------------------------------------------
>>>
>>> cmake .. \
>>> -DCMAKE_TOOLCHAIN_FILE=../cmake/Platform/BlueGeneP-static-XL-C.cmake \
>>>    -DBUILD_SHARED_LIBS=OFF \
>>>    -DGMX_MPI=ON \
>>>    -DCMAKE_C_FLAGS="-O3 -qarch=450d -qtune=450" \
>>>    -DCMAKE_INSTALL_PREFIX=/scratch/bgapps/gromacs-4.6.2 \
>>>    -DGMX_CPU_ACCELERATION=None \
>>>    -DGMX_THREAD_MPI=OFF \
>>>    -DGMX_OPENMP=OFF \
>>>    -DGMX_DEFAULT_SUFFIX=ON \
>>>    -DCMAKE_PREFIX_PATH=/scratch/bgapps/fftw-3.3.2 \
>>>     2>&1 | tee cmake.log
>>>
>>> ------------------------------------------snip-------------------------------------------
>>>
>>> When one of my users submits a job, it dumps core. My scheduler is
>>> LoadLeveler, and I used this JCF file to replicate the problem. I added
>>> the
>>> '-debug 1' flag after searching the gmx-users archives:
>>>
>>> ------------------------------------------snip-------------------------------------------
>>>
>>> #!/bin/bash
>>> # @ job_name = xiang
>>> # @ job_type = bluegene
>>> # @ bg_size = 64
>>> # @ class = small
>>> # @ wall_clock_limit = 01:00:00,00:50:00
>>> # @ error = job.$(Cluster).$(Process).err
>>> # @ output = job.$(Cluster).$(Process).out
>>> # @ environment = COPY_ALL;
>>> # @ queue
>>>
>>> source /scratch/bgapps/gromacs-4.6.2/bin/GMXRC.bash
>>>
>>> ------------------------------------------snip-------------------------------------------
>>>
>>> /bgsys/drivers/ppcfloor/bin/mpirun
>>> /scratch/bgapps/gromacs-4.6.2/bin/mdrun_mpi -pin off -deffnm sbm-b_dyn3
>>> -v
>>> -dlb yes -debug 1
>>>
>>> The stderr file shows this at the bottom, which isn't too helpful:
>>>
>>> ------------------------------------------snip-------------------------------------------
>>>
>>> Reading file sbm-b_dyn3.tpr, VERSION 4.6.2 (single precision)
>>>
>>> Will use 48 particle-particle and 16 PME only nodes
>>> This is a guess, check the performance at the end of the log file
>>> Using 64 MPI processes
>>> 
> <Aug 06 17:25:55.303879>
>  BE_MPI (ERROR): The error message in the job record
>>> is as follows:
>>> 
> <Aug 06 17:25:55.303940>
>  BE_MPI (ERROR):   "killed with signal 6"
>>>
>>> -----------------------------------------snip-----------------------------------------------
>>>
>>> I have a bunch of core files which I can analyze with the IBM Core file
>>> processor, and I also have bunch of debug files from mdrun. I went
>>> through
>>> about 12/64 of them, and didn't see anything that looked like an error.
>>>
>>> Can anyone offer me any suggestions of what to look for, or additional
>>> debugging steps I can take? Please keep in mind I'm the system
>>> administrator
>>> and not an expert-user of gromacs, so I'm not sure if the inputs are
>>> correct, or are at correct for my BG/P configuration. Any help will be
>>> greatly appreciated.
>>>
>>> Thanks,
>>> Prentice
>>>
>>> --
>>> gmx-users mailing list    

> gmx-users@

>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>> * Please don't post (un)subscribe requests to the list. Use the www
>>> interface or send it to 

> gmx-users-request@

> .
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> -- 
> gmx-users mailing list    

> gmx-users@

> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to 

> gmx-users-request@

> .
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists



--
View this message in context: http://gromacs.5086.x6.nabble.com/Assistance-needed-running-gromacs-4-6-3-on-Blue-Gene-P-tp5010370p5015114.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.


More information about the gromacs.org_gmx-users mailing list