[gmx-users] multiple processes of a gromacs tool requiring user action at runtime on one Cray XC30 node using aprun

Vedat Durmaz durmaz at zib.de
Thu Oct 29 19:11:14 CET 2015


after several days of trial and error, i was told only today that our 
HPC indeed has one cluster/queue (40 core nodes SMP) that does not 
require the use of aprun/mprun. so, after having compiled all the tools 
again on that cluster, i am finally able to execute many processes per node.

(however, we were not able to remedy the other issue regarding "aprun" 
in between. nevertheless, i'm fine now.)

thanks for your help guys and good evening

vedat



Am 29.10.2015 um 12:53 schrieb Rashmi:
> Hi,
>
> As written on the website, g_mmpbsa does not directly support MPI. g_mmpbsa
> does not include any code concerning OpenMP and MPI. However, We have tried
> to interface MPI and OpenMP functionality of APBS by some mechanism.
>
> One may use g_mmpbsa with MPI by following: (1) allocate number of
> processors through queue management system, (2) define APBS environment
> variable (export APBS="mpirun -np 8 apbs") that includes all required
> flags, then start g_mmpbsa directly without using mpirun (or any similar
> program). If queue management system specifically requires aprun/mpirun for
> execution of program, g_mmpbsa might not work in this case.
>
> One may use g_mmpbsa with OpenMP by following: (1)  allocate number of
> threads through queue management system, (2) define OMP_NUM_THREADS
> variable for allocated number of threads and (3) execute g_mmpbsa.
>
> We have not tested simultaneous use of both MPI and OpenMP, so we do not
> know that it will work.
>
> Concerning standard input for g_mmpbsa, if echo or <<EOF .. .. .. <<EOF is
> not working. One may try using a file as following:
>
> ​export
>   OMP_NUM_THREADS
> ​=​
> 8
>
> aprun -n 1 -N 1 -d 8 g_mmpbsa -f traj.xtc -s topol.tpr -n index.ndx -i
> mmpbsa.mdp <input_index
>
> Here, input_index contains group numbers in separate line and last line
> should be empty.
> $ cat input_index
> 1
> 13
>
> ​​
>
> Concerning, 1800 directories, you may write a shell script to automate job
> submission by going into each directory, start a g_mmpbsa process (or
> submit job script) and then move to next directory.
>
> Hope this information would be helpful.
>
>
> Thanks.
>
>
>
> On Thu, Oct 29, 2015 at 12:01 PM, Vedat Durmaz <durmaz at zib.de> wrote:
>
>> hi again,
>>
>> 3 answers are hidden somewhere below ..
>>
>>
>> Am 28.10.2015 um 15:45 schrieb Mark Abraham:
>>
>>> Hi,
>>>
>>> On Wed, Oct 28, 2015 at 3:19 PM Vedat Durmaz <durmaz at zib.de> wrote:
>>>
>>>
>>>> Am 27.10.2015 um 23:57 schrieb Mark Abraham:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> On Tue, Oct 27, 2015 at 11:39 PM Vedat Durmaz <durmaz at zib.de> wrote:
>>>>>
>>>>> hi mark,
>>>>>> many thanks. but can you be a little more precise? the author's only
>>>>>> hint regarding mpi is on this site
>>>>>> "http://rashmikumari.github.io/g_mmpbsa/How-to-Run.html" and related
>>>>>> to
>>>>>> APBS. g_mmpbsa itself doesn't understand openmp/mpi afaik.
>>>>>>
>>>>>> the error i'm observing is occurring pretty much before apbs is
>>>>>> started.
>>>>>> to be honest, i can't see any link to my initial question ...
>>>>>>
>>>>>> It has the sentence "Although g_mmpbsa does not support mpirun..."
>>>>> aprun
>>>>>
>>>> is
>>>>
>>>>> a form of mpirun, so I assumed you knew that what you were trying was
>>>>> actually something that could work, which would therefore have to be
>>>>> with
>>>>> the APBS back end. The point of what it says there is that you don't run
>>>>> g_mmpbsa with aprun, you tell it how to run APBS with aprun. This just
>>>>> avoids the problem entirely because your redirected/interactive input
>>>>>
>>>> goes
>>>>
>>>>> to a single g_mmpbsa as normal, which then launches APBS with MPI
>>>>>
>>>> support.
>>>>
>>>>> Tool authors need to actively write code to be useful with MPI, so
>>>>> unless
>>>>> you know what you are doing is supposed to work with MPI because they
>>>>> say
>>>>> it works, don't try.
>>>>>
>>>>> Mark
>>>>>
>>>> you are right. it's apbs which ought to run in parallel mode. of course,
>>>> i can set the variable 'export APBS="mpirun -np 8 apbs"' [or set 'export
>>>> OMP_NUM_THREADS=8'] if i want to split a 24 cores-node to let's say 3
>>>> independent g_mmpbsa processes. the problem is that i must start
>>>> g_mmpbsa itself with aprun (in the script run_mmpbsa.sh).
>>>>
>>> No. Your job runs a shell script on your compute node. It can do anything
>>> it likes, but it would make sense to run something in parallel at some
>>> point. You need to build a g_mmpbsa that you can just run in a shell
>>> script
>>> that echoes in the input (try that on its own first). Then you use the
>>> above approach so that the single process that is g_mmpbsa does the call
>>> to
>>> aprun (which is the cray mpirun) to run APBS in MPI mode.
>>>
>>> It is likely that even if you run g_mmpbsa with aprun and solve the input
>>> issue somewhow, the MPI runtime will refuse to start the child APBS with
>>> aprun, because nesting is typically unsupported (and your current command
>>> lines haven't given it enough information to do a good job even if it is
>>> supported).
>>>
>> yes, i've encountered issues with nested aprun calls. so this will hardly
>> work i guess.
>>
>>
>>> i absolutely
>>>> cannot see any other way of running apbs when using it out of g_mmpbs.
>>>> hence, i need to run
>>>>
>>>> aprun -n 3 -N 3 -cc 0-7:8-15:16-23 ../run_mmpbsa.sh
>>>>
>>>> This likely starts three copies of g_mmpbsa each of which expect terminal
>>> input, which maybe you can teach aprun to manage, but then each g_mmpbsa
>>> will then do its own APBS and this is completely not what you want.
>>>
>> hmm, to be honest, i would say this is exactly what i'm trying to achieve.
>> isn't it? i want 3 independent g_mmpbsa runs each of which executed in
>> another directory with its own APBS. by the way, all together i have 1800
>> such directories each containing another trajectory.
>>
>> if someone is ever (within the next 20 hours!) able to figure out a
>> solution for this purpose, i would be absolutely pleased.
>>
>>
>> and of course i'm aware about having given 8 cores to g_mmpbsa, hoping
>>>> that it is able to read my input and to run apbs which hopefully uses
>>>> all of the 8 cores. the user input (choosing protein, then ligand),
>>>> however, "Cannot [be] read". this issue occurs quite early during the
>>>> g_mmpbsa process and therefore has nothing to do with the apbs (either
>>>> with openmp or mpi) functionality which is launched later.
>>>>
>>>> if i simulate the whole story (spreading 24 cores of a node over 3
>>>> processes) using a bash script (instead of g_mmpbsa) which just expects
>>>> (and prints) the two inputs during runtime and which i start three times
>>>> on one node, everything works fine. i'm just asking myself whether
>>>> someone knows why gromacs fails under the same conditions and whether it
>>>> is possible to remedy that problem.
>>>>
>>> By the way, GROMACS isn't failing. You're using a separately provided
>>> program, so you should really be talking to its authors for help. ;-)
>>>
>>> mpirun -np 3 gmx_mpi make_ndx
>>>
>>> would work fine (though not usefully), if you use the mechanisms provided
>>> by mpirun to control how the redirection to the stdin of the child
>>> processes should work. But handling that redirection is an issue between
>>> you and the docs of your mpirun :-)
>>>
>>> Mark
>>>
>> unfortunately, there is only very few information about stdin redirection
>> associated with aprun. what i've done now is modifying g_mmpbsa such that
>> no user input is required. starting
>>
>> aprun -n 3 -N 3 -cc 0-7:8-15:16-23  ../run_mmpbsa.sh
>>
>> where, using the $ALPS_APP_PE variable, i successfully enter three
>> directories (dir_1, dir_2, dir_3, all containing identical file names) and
>> start g_mmpbsa in each of them. now what happens is that all the new files
>> are generated in the first of the 3 folders (while the two others are not
>> affected at all). and all new files are generated 3 times (file, #file1#,
>> #file2#) in a manner which is typical for gromacs' backup philosophy. so on
>> some (hardware?) level the data of the 3 processes are not well separated.
>> the supervisors of our HPC system were not able to figure out the reasons
>> so far. that's why i'm trying to find help here from someone that was
>> successful in sharing computing nodes in a similar way.
>>
>> anyways. thanks for your time so far.
>>
>>
>>
>>
>>>
>>>>> Am 27.10.2015 um 22:43 schrieb Mark Abraham:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I think if you check out how the g_mmpbsa author intends you to use
>>>>>>> MPI
>>>>>>> with the tool, your problem goes away.
>>>>>>> http://rashmikumari.github.io/g_mmpbsa/Usage.html
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>> On Tue, Oct 27, 2015 at 10:10 PM Vedat Durmaz <durmaz at zib.de> wrote:
>>>>>>>
>>>>>>> hi guys,
>>>>>>>> I'm struggling with the use of diverse gromacs commands on a Cray
>>>>>>>> XC30
>>>>>>>> system. actually, it's about the external tool g_mmpbsa which
>>>>>>>> requires
>>>>>>>> user action during runtime. i get similar errors with other Gromacs
>>>>>>>> tools, e.g., make_ndx, though, i know that it doesn't make sense to
>>>>>>>>
>>>>>>> use
>>>>> more than one core for make_ndx. however, g_mmpsa (or rather apbs used
>>>>>>>> by g_mmpbsa) is supposed to be capable of multiple cores using
>>>>>>>> openmp.
>>>>>>>> so, as long as i assign all of the 24 cores of a computing node to
>>>>>>>> one
>>>>>>>> process through
>>>>>>>>
>>>>>>>> aprun -n 1 ../run_mmpbsa.sh
>>>>>>>>
>>>>>>>> everthing works fine. user input is accepted either interactively, by
>>>>>>>> using the echo command, or through a here construction (""... << EOF
>>>>>>>>
>>>>>>> ...
>>>>> EOF). however, as soon as I try to split the 24 cores of a node to
>>>>>>>> multiple processes (more than one) using for instance
>>>>>>>>
>>>>>>>> aprun -n 3 -N 3 -cc 0-7:8-15:16-23 ../run_mmpbsa.sh
>>>>>>>>
>>>>>>>> (and OMP_NUM_THREADS=8), there is neither an occasion to feed with
>>>>>>>>
>>>>>>> user
>>>>> input in the interactive mode nor it is recognized through echo/here
>>>>>>> in
>>>>> the script. instead, i get the error
>>>>>>>>      >> Source code file: .../gromacs-4.6.7/src/gmxlib/index.c, line:
>>>>>>>>
>>>>>>> 1192
>>>>>      >> Fatal error:
>>>>>>>>      >> Cannot read from input
>>>>>>>>
>>>>>>>> where, according to the source code, "scanf" malfunctions. when i
>>>>>>>> use,
>>>>>>>> for comparison purposes, make_ndx that i would like to feed with "q"
>>>>>>>> i
>>>>>>>> observe a similar error:
>>>>>>>>
>>>>>>>>      >>Source code file: .../gromacs-4.6.7/src/tools/gmx_make_ndx.c,
>>>>>>>>
>>>>>>> line:
>>>>> 1219
>>>>>>>      >>Fatal error:
>>>>>>>>      >>Error reading user input
>>>>>>>>
>>>>>>>> here, it's "fgets" which is malfunctioning.
>>>>>>>>
>>>>>>>> does anyone have an idea what this could be caused by? what do i need
>>>>>>>>
>>>>>>> to
>>>>> consider/change in order to be able to start more than process on one
>>>>>>>> computing node?
>>>>>>>>
>>>>>>>> thanks in advance
>>>>>>>>
>>>>>>>> vedat
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at
>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>> posting!
>>>>>>>>
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or
>>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>>>
>>>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>> posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>
>>>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-request at gromacs.org.
>>>>
>>>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
>
>



More information about the gromacs.org_gmx-users mailing list