[gmx-users] multiple processes of a gromacs tool requiring user action at runtime on one Cray XC30 node using aprun

Mark Abraham mark.j.abraham at gmail.com
Thu Oct 29 22:56:09 CET 2015


Ok. I misunderstood your intention to be to run a single calculation with
three MPI domains sharing its work. To simply get three independent non-MPI
calculations you could indeed use your approach. It sounds like you ran
into some behaviour where only one of the three calculations got the stdin,
because that's a more normal thing when launching actual parallel
calculations. A tool like GNU parallel might be closer to what you need.

Mark

On Thu, 29 Oct 2015 19:11 Vedat Durmaz <durmaz at zib.de> wrote:

>
> after several days of trial and error, i was told only today that our
> HPC indeed has one cluster/queue (40 core nodes SMP) that does not
> require the use of aprun/mprun. so, after having compiled all the tools
> again on that cluster, i am finally able to execute many processes per
> node.
>
> (however, we were not able to remedy the other issue regarding "aprun"
> in between. nevertheless, i'm fine now.)
>
> thanks for your help guys and good evening
>
> vedat
>
>
>
> Am 29.10.2015 um 12:53 schrieb Rashmi:
> > Hi,
> >
> > As written on the website, g_mmpbsa does not directly support MPI.
> g_mmpbsa
> > does not include any code concerning OpenMP and MPI. However, We have
> tried
> > to interface MPI and OpenMP functionality of APBS by some mechanism.
> >
> > One may use g_mmpbsa with MPI by following: (1) allocate number of
> > processors through queue management system, (2) define APBS environment
> > variable (export APBS="mpirun -np 8 apbs") that includes all required
> > flags, then start g_mmpbsa directly without using mpirun (or any similar
> > program). If queue management system specifically requires aprun/mpirun
> for
> > execution of program, g_mmpbsa might not work in this case.
> >
> > One may use g_mmpbsa with OpenMP by following: (1)  allocate number of
> > threads through queue management system, (2) define OMP_NUM_THREADS
> > variable for allocated number of threads and (3) execute g_mmpbsa.
> >
> > We have not tested simultaneous use of both MPI and OpenMP, so we do not
> > know that it will work.
> >
> > Concerning standard input for g_mmpbsa, if echo or <<EOF .. .. .. <<EOF
> is
> > not working. One may try using a file as following:
> >
> > ​export
> >   OMP_NUM_THREADS
> > ​=​
> > 8
> >
> > aprun -n 1 -N 1 -d 8 g_mmpbsa -f traj.xtc -s topol.tpr -n index.ndx -i
> > mmpbsa.mdp <input_index
> >
> > Here, input_index contains group numbers in separate line and last line
> > should be empty.
> > $ cat input_index
> > 1
> > 13
> >
> > ​​
> >
> > Concerning, 1800 directories, you may write a shell script to automate
> job
> > submission by going into each directory, start a g_mmpbsa process (or
> > submit job script) and then move to next directory.
> >
> > Hope this information would be helpful.
> >
> >
> > Thanks.
> >
> >
> >
> > On Thu, Oct 29, 2015 at 12:01 PM, Vedat Durmaz <durmaz at zib.de> wrote:
> >
> >> hi again,
> >>
> >> 3 answers are hidden somewhere below ..
> >>
> >>
> >> Am 28.10.2015 um 15:45 schrieb Mark Abraham:
> >>
> >>> Hi,
> >>>
> >>> On Wed, Oct 28, 2015 at 3:19 PM Vedat Durmaz <durmaz at zib.de> wrote:
> >>>
> >>>
> >>>> Am 27.10.2015 um 23:57 schrieb Mark Abraham:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>>
> >>>>> On Tue, Oct 27, 2015 at 11:39 PM Vedat Durmaz <durmaz at zib.de> wrote:
> >>>>>
> >>>>> hi mark,
> >>>>>> many thanks. but can you be a little more precise? the author's only
> >>>>>> hint regarding mpi is on this site
> >>>>>> "http://rashmikumari.github.io/g_mmpbsa/How-to-Run.html" and
> related
> >>>>>> to
> >>>>>> APBS. g_mmpbsa itself doesn't understand openmp/mpi afaik.
> >>>>>>
> >>>>>> the error i'm observing is occurring pretty much before apbs is
> >>>>>> started.
> >>>>>> to be honest, i can't see any link to my initial question ...
> >>>>>>
> >>>>>> It has the sentence "Although g_mmpbsa does not support mpirun..."
> >>>>> aprun
> >>>>>
> >>>> is
> >>>>
> >>>>> a form of mpirun, so I assumed you knew that what you were trying was
> >>>>> actually something that could work, which would therefore have to be
> >>>>> with
> >>>>> the APBS back end. The point of what it says there is that you don't
> run
> >>>>> g_mmpbsa with aprun, you tell it how to run APBS with aprun. This
> just
> >>>>> avoids the problem entirely because your redirected/interactive input
> >>>>>
> >>>> goes
> >>>>
> >>>>> to a single g_mmpbsa as normal, which then launches APBS with MPI
> >>>>>
> >>>> support.
> >>>>
> >>>>> Tool authors need to actively write code to be useful with MPI, so
> >>>>> unless
> >>>>> you know what you are doing is supposed to work with MPI because they
> >>>>> say
> >>>>> it works, don't try.
> >>>>>
> >>>>> Mark
> >>>>>
> >>>> you are right. it's apbs which ought to run in parallel mode. of
> course,
> >>>> i can set the variable 'export APBS="mpirun -np 8 apbs"' [or set
> 'export
> >>>> OMP_NUM_THREADS=8'] if i want to split a 24 cores-node to let's say 3
> >>>> independent g_mmpbsa processes. the problem is that i must start
> >>>> g_mmpbsa itself with aprun (in the script run_mmpbsa.sh).
> >>>>
> >>> No. Your job runs a shell script on your compute node. It can do
> anything
> >>> it likes, but it would make sense to run something in parallel at some
> >>> point. You need to build a g_mmpbsa that you can just run in a shell
> >>> script
> >>> that echoes in the input (try that on its own first). Then you use the
> >>> above approach so that the single process that is g_mmpbsa does the
> call
> >>> to
> >>> aprun (which is the cray mpirun) to run APBS in MPI mode.
> >>>
> >>> It is likely that even if you run g_mmpbsa with aprun and solve the
> input
> >>> issue somewhow, the MPI runtime will refuse to start the child APBS
> with
> >>> aprun, because nesting is typically unsupported (and your current
> command
> >>> lines haven't given it enough information to do a good job even if it
> is
> >>> supported).
> >>>
> >> yes, i've encountered issues with nested aprun calls. so this will
> hardly
> >> work i guess.
> >>
> >>
> >>> i absolutely
> >>>> cannot see any other way of running apbs when using it out of g_mmpbs.
> >>>> hence, i need to run
> >>>>
> >>>> aprun -n 3 -N 3 -cc 0-7:8-15:16-23 ../run_mmpbsa.sh
> >>>>
> >>>> This likely starts three copies of g_mmpbsa each of which expect
> terminal
> >>> input, which maybe you can teach aprun to manage, but then each
> g_mmpbsa
> >>> will then do its own APBS and this is completely not what you want.
> >>>
> >> hmm, to be honest, i would say this is exactly what i'm trying to
> achieve.
> >> isn't it? i want 3 independent g_mmpbsa runs each of which executed in
> >> another directory with its own APBS. by the way, all together i have
> 1800
> >> such directories each containing another trajectory.
> >>
> >> if someone is ever (within the next 20 hours!) able to figure out a
> >> solution for this purpose, i would be absolutely pleased.
> >>
> >>
> >> and of course i'm aware about having given 8 cores to g_mmpbsa, hoping
> >>>> that it is able to read my input and to run apbs which hopefully uses
> >>>> all of the 8 cores. the user input (choosing protein, then ligand),
> >>>> however, "Cannot [be] read". this issue occurs quite early during the
> >>>> g_mmpbsa process and therefore has nothing to do with the apbs (either
> >>>> with openmp or mpi) functionality which is launched later.
> >>>>
> >>>> if i simulate the whole story (spreading 24 cores of a node over 3
> >>>> processes) using a bash script (instead of g_mmpbsa) which just
> expects
> >>>> (and prints) the two inputs during runtime and which i start three
> times
> >>>> on one node, everything works fine. i'm just asking myself whether
> >>>> someone knows why gromacs fails under the same conditions and whether
> it
> >>>> is possible to remedy that problem.
> >>>>
> >>> By the way, GROMACS isn't failing. You're using a separately provided
> >>> program, so you should really be talking to its authors for help. ;-)
> >>>
> >>> mpirun -np 3 gmx_mpi make_ndx
> >>>
> >>> would work fine (though not usefully), if you use the mechanisms
> provided
> >>> by mpirun to control how the redirection to the stdin of the child
> >>> processes should work. But handling that redirection is an issue
> between
> >>> you and the docs of your mpirun :-)
> >>>
> >>> Mark
> >>>
> >> unfortunately, there is only very few information about stdin
> redirection
> >> associated with aprun. what i've done now is modifying g_mmpbsa such
> that
> >> no user input is required. starting
> >>
> >> aprun -n 3 -N 3 -cc 0-7:8-15:16-23  ../run_mmpbsa.sh
> >>
> >> where, using the $ALPS_APP_PE variable, i successfully enter three
> >> directories (dir_1, dir_2, dir_3, all containing identical file names)
> and
> >> start g_mmpbsa in each of them. now what happens is that all the new
> files
> >> are generated in the first of the 3 folders (while the two others are
> not
> >> affected at all). and all new files are generated 3 times (file,
> #file1#,
> >> #file2#) in a manner which is typical for gromacs' backup philosophy.
> so on
> >> some (hardware?) level the data of the 3 processes are not well
> separated.
> >> the supervisors of our HPC system were not able to figure out the
> reasons
> >> so far. that's why i'm trying to find help here from someone that was
> >> successful in sharing computing nodes in a similar way.
> >>
> >> anyways. thanks for your time so far.
> >>
> >>
> >>
> >>
> >>>
> >>>>> Am 27.10.2015 um 22:43 schrieb Mark Abraham:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I think if you check out how the g_mmpbsa author intends you to use
> >>>>>>> MPI
> >>>>>>> with the tool, your problem goes away.
> >>>>>>> http://rashmikumari.github.io/g_mmpbsa/Usage.html
> >>>>>>>
> >>>>>>> Mark
> >>>>>>>
> >>>>>>> On Tue, Oct 27, 2015 at 10:10 PM Vedat Durmaz <durmaz at zib.de>
> wrote:
> >>>>>>>
> >>>>>>> hi guys,
> >>>>>>>> I'm struggling with the use of diverse gromacs commands on a Cray
> >>>>>>>> XC30
> >>>>>>>> system. actually, it's about the external tool g_mmpbsa which
> >>>>>>>> requires
> >>>>>>>> user action during runtime. i get similar errors with other
> Gromacs
> >>>>>>>> tools, e.g., make_ndx, though, i know that it doesn't make sense
> to
> >>>>>>>>
> >>>>>>> use
> >>>>> more than one core for make_ndx. however, g_mmpsa (or rather apbs
> used
> >>>>>>>> by g_mmpbsa) is supposed to be capable of multiple cores using
> >>>>>>>> openmp.
> >>>>>>>> so, as long as i assign all of the 24 cores of a computing node to
> >>>>>>>> one
> >>>>>>>> process through
> >>>>>>>>
> >>>>>>>> aprun -n 1 ../run_mmpbsa.sh
> >>>>>>>>
> >>>>>>>> everthing works fine. user input is accepted either
> interactively, by
> >>>>>>>> using the echo command, or through a here construction (""... <<
> EOF
> >>>>>>>>
> >>>>>>> ...
> >>>>> EOF). however, as soon as I try to split the 24 cores of a node to
> >>>>>>>> multiple processes (more than one) using for instance
> >>>>>>>>
> >>>>>>>> aprun -n 3 -N 3 -cc 0-7:8-15:16-23 ../run_mmpbsa.sh
> >>>>>>>>
> >>>>>>>> (and OMP_NUM_THREADS=8), there is neither an occasion to feed with
> >>>>>>>>
> >>>>>>> user
> >>>>> input in the interactive mode nor it is recognized through echo/here
> >>>>>>> in
> >>>>> the script. instead, i get the error
> >>>>>>>>      >> Source code file: .../gromacs-4.6.7/src/gmxlib/index.c,
> line:
> >>>>>>>>
> >>>>>>> 1192
> >>>>>      >> Fatal error:
> >>>>>>>>      >> Cannot read from input
> >>>>>>>>
> >>>>>>>> where, according to the source code, "scanf" malfunctions. when i
> >>>>>>>> use,
> >>>>>>>> for comparison purposes, make_ndx that i would like to feed with
> "q"
> >>>>>>>> i
> >>>>>>>> observe a similar error:
> >>>>>>>>
> >>>>>>>>      >>Source code file:
> .../gromacs-4.6.7/src/tools/gmx_make_ndx.c,
> >>>>>>>>
> >>>>>>> line:
> >>>>> 1219
> >>>>>>>      >>Fatal error:
> >>>>>>>>      >>Error reading user input
> >>>>>>>>
> >>>>>>>> here, it's "fgets" which is malfunctioning.
> >>>>>>>>
> >>>>>>>> does anyone have an idea what this could be caused by? what do i
> need
> >>>>>>>>
> >>>>>>> to
> >>>>> consider/change in order to be able to start more than process on one
> >>>>>>>> computing node?
> >>>>>>>>
> >>>>>>>> thanks in advance
> >>>>>>>>
> >>>>>>>> vedat
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Gromacs Users mailing list
> >>>>>>>>
> >>>>>>>> * Please search the archive at
> >>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> before
> >>>>>>>> posting!
> >>>>>>>>
> >>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>>>>>
> >>>>>>>> * For (un)subscribe requests visit
> >>>>>>>>
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> >>>>>>>> or
> >>>>>>>> send a mail to gmx-users-request at gromacs.org.
> >>>>>>>>
> >>>>>>>> --
> >>>>>> Gromacs Users mailing list
> >>>>>>
> >>>>>> * Please search the archive at
> >>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>>>>> posting!
> >>>>>>
> >>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>>>
> >>>>>> * For (un)subscribe requests visit
> >>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> >>>>>> send a mail to gmx-users-request at gromacs.org.
> >>>>>>
> >>>>>> --
> >>>> Gromacs Users mailing list
> >>>>
> >>>> * Please search the archive at
> >>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>>> posting!
> >>>>
> >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>
> >>>> * For (un)subscribe requests visit
> >>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >>>> send a mail to gmx-users-request at gromacs.org.
> >>>>
> >>>>
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
> >>
> >
> >
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list