[gmx-users] multiple processes of a gromacs tool requiring user action at runtime on one Cray XC30 node using aprun

Vedat Durmaz durmaz at zib.de
Thu Oct 29 12:02:30 CET 2015


hi again,

3 answers are hidden somewhere below ..

Am 28.10.2015 um 15:45 schrieb Mark Abraham:
> Hi,
>
> On Wed, Oct 28, 2015 at 3:19 PM Vedat Durmaz <durmaz at zib.de> wrote:
>
>>
>> Am 27.10.2015 um 23:57 schrieb Mark Abraham:
>>> Hi,
>>>
>>>
>>> On Tue, Oct 27, 2015 at 11:39 PM Vedat Durmaz <durmaz at zib.de> wrote:
>>>
>>>> hi mark,
>>>>
>>>> many thanks. but can you be a little more precise? the author's only
>>>> hint regarding mpi is on this site
>>>> "http://rashmikumari.github.io/g_mmpbsa/How-to-Run.html" and related to
>>>> APBS. g_mmpbsa itself doesn't understand openmp/mpi afaik.
>>>>
>>>> the error i'm observing is occurring pretty much before apbs is started.
>>>> to be honest, i can't see any link to my initial question ...
>>>>
>>> It has the sentence "Although g_mmpbsa does not support mpirun..." aprun
>> is
>>> a form of mpirun, so I assumed you knew that what you were trying was
>>> actually something that could work, which would therefore have to be with
>>> the APBS back end. The point of what it says there is that you don't run
>>> g_mmpbsa with aprun, you tell it how to run APBS with aprun. This just
>>> avoids the problem entirely because your redirected/interactive input
>> goes
>>> to a single g_mmpbsa as normal, which then launches APBS with MPI
>> support.
>>> Tool authors need to actively write code to be useful with MPI, so unless
>>> you know what you are doing is supposed to work with MPI because they say
>>> it works, don't try.
>>>
>>> Mark
>> you are right. it's apbs which ought to run in parallel mode. of course,
>> i can set the variable 'export APBS="mpirun -np 8 apbs"' [or set 'export
>> OMP_NUM_THREADS=8'] if i want to split a 24 cores-node to let's say 3
>> independent g_mmpbsa processes. the problem is that i must start
>> g_mmpbsa itself with aprun (in the script run_mmpbsa.sh).
>
> No. Your job runs a shell script on your compute node. It can do anything
> it likes, but it would make sense to run something in parallel at some
> point. You need to build a g_mmpbsa that you can just run in a shell script
> that echoes in the input (try that on its own first). Then you use the
> above approach so that the single process that is g_mmpbsa does the call to
> aprun (which is the cray mpirun) to run APBS in MPI mode.
>
> It is likely that even if you run g_mmpbsa with aprun and solve the input
> issue somewhow, the MPI runtime will refuse to start the child APBS with
> aprun, because nesting is typically unsupported (and your current command
> lines haven't given it enough information to do a good job even if it is
> supported).

yes, i've encountered issues with nested aprun calls. so this will 
hardly work i guess.

>
>> i absolutely
>> cannot see any other way of running apbs when using it out of g_mmpbs.
>> hence, i need to run
>>
>> aprun -n 3 -N 3 -cc 0-7:8-15:16-23 ../run_mmpbsa.sh
>>
> This likely starts three copies of g_mmpbsa each of which expect terminal
> input, which maybe you can teach aprun to manage, but then each g_mmpbsa
> will then do its own APBS and this is completely not what you want.

hmm, to be honest, i would say this is exactly what i'm trying to 
achieve. isn't it? i want 3 independent g_mmpbsa runs each of which 
executed in another directory with its own APBS. by the way, all 
together i have 1800 such directories each containing another trajectory.

if someone is ever (within the next 20 hours!) able to figure out a 
solution for this purpose, i would be absolutely pleased.


>> and of course i'm aware about having given 8 cores to g_mmpbsa, hoping
>> that it is able to read my input and to run apbs which hopefully uses
>> all of the 8 cores. the user input (choosing protein, then ligand),
>> however, "Cannot [be] read". this issue occurs quite early during the
>> g_mmpbsa process and therefore has nothing to do with the apbs (either
>> with openmp or mpi) functionality which is launched later.
>>
>> if i simulate the whole story (spreading 24 cores of a node over 3
>> processes) using a bash script (instead of g_mmpbsa) which just expects
>> (and prints) the two inputs during runtime and which i start three times
>> on one node, everything works fine. i'm just asking myself whether
>> someone knows why gromacs fails under the same conditions and whether it
>> is possible to remedy that problem.
>
> By the way, GROMACS isn't failing. You're using a separately provided
> program, so you should really be talking to its authors for help. ;-)
>
> mpirun -np 3 gmx_mpi make_ndx
>
> would work fine (though not usefully), if you use the mechanisms provided
> by mpirun to control how the redirection to the stdin of the child
> processes should work. But handling that redirection is an issue between
> you and the docs of your mpirun :-)
>
> Mark

unfortunately, there is only very few information about stdin 
redirection associated with aprun. what i've done now is modifying 
g_mmpbsa such that no user input is required. starting

aprun -n 3 -N 3 -cc 0-7:8-15:16-23  ../run_mmpbsa.sh

where, using the $ALPS_APP_PE variable, i successfully enter three 
directories (dir_1, dir_2, dir_3, all containing identical file names) 
and start g_mmpbsa in each of them. now what happens is that all the new 
files are generated in the first of the 3 folders (while the two others 
are not affected at all). and all new files are generated 3 times (file, 
#file1#, #file2#) in a manner which is typical for gromacs' backup 
philosophy. so on some (hardware?) level the data of the 3 processes are 
not well separated. the supervisors of our HPC system were not able to 
figure out the reasons so far. that's why i'm trying to find help here 
from someone that was successful in sharing computing nodes in a similar 
way.

anyways. thanks for your time so far.


>
>
>>>
>>>> Am 27.10.2015 um 22:43 schrieb Mark Abraham:
>>>>> Hi,
>>>>>
>>>>> I think if you check out how the g_mmpbsa author intends you to use MPI
>>>>> with the tool, your problem goes away.
>>>>> http://rashmikumari.github.io/g_mmpbsa/Usage.html
>>>>>
>>>>> Mark
>>>>>
>>>>> On Tue, Oct 27, 2015 at 10:10 PM Vedat Durmaz <durmaz at zib.de> wrote:
>>>>>
>>>>>> hi guys,
>>>>>>
>>>>>> I'm struggling with the use of diverse gromacs commands on a Cray XC30
>>>>>> system. actually, it's about the external tool g_mmpbsa which requires
>>>>>> user action during runtime. i get similar errors with other Gromacs
>>>>>> tools, e.g., make_ndx, though, i know that it doesn't make sense to
>> use
>>>>>> more than one core for make_ndx. however, g_mmpsa (or rather apbs used
>>>>>> by g_mmpbsa) is supposed to be capable of multiple cores using openmp.
>>>>>> so, as long as i assign all of the 24 cores of a computing node to one
>>>>>> process through
>>>>>>
>>>>>> aprun -n 1 ../run_mmpbsa.sh
>>>>>>
>>>>>> everthing works fine. user input is accepted either interactively, by
>>>>>> using the echo command, or through a here construction (""... << EOF
>> ...
>>>>>> EOF). however, as soon as I try to split the 24 cores of a node to
>>>>>> multiple processes (more than one) using for instance
>>>>>>
>>>>>> aprun -n 3 -N 3 -cc 0-7:8-15:16-23 ../run_mmpbsa.sh
>>>>>>
>>>>>> (and OMP_NUM_THREADS=8), there is neither an occasion to feed with
>> user
>>>>>> input in the interactive mode nor it is recognized through echo/here
>> in
>>>>>> the script. instead, i get the error
>>>>>>
>>>>>>     >> Source code file: .../gromacs-4.6.7/src/gmxlib/index.c, line:
>> 1192
>>>>>>     >> Fatal error:
>>>>>>     >> Cannot read from input
>>>>>>
>>>>>> where, according to the source code, "scanf" malfunctions. when i use,
>>>>>> for comparison purposes, make_ndx that i would like to feed with "q" i
>>>>>> observe a similar error:
>>>>>>
>>>>>>     >>Source code file: .../gromacs-4.6.7/src/tools/gmx_make_ndx.c,
>> line:
>>>> 1219
>>>>>>     >>Fatal error:
>>>>>>     >>Error reading user input
>>>>>>
>>>>>> here, it's "fgets" which is malfunctioning.
>>>>>>
>>>>>> does anyone have an idea what this could be caused by? what do i need
>> to
>>>>>> consider/change in order to be able to start more than process on one
>>>>>> computing node?
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> vedat
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>> posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-request at gromacs.org.
>>>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>



More information about the gromacs.org_gmx-users mailing list