[gmx-users] TestBed in MPI not working
Mark Abraham
Mark.Abraham at anu.edu.au
Tue May 12 03:26:36 CEST 2009
Jones de Andrade wrote:
> Hi Justin.
>
> Thanks a lot for that. It helped, but enough yet. :( Just made 4.0.4
> tests reach the same "range of errors" that I'm getting with 3.3.3. :P
>
> Using openMPI, it just complains that it can't find orted. That would
> mean that the paths are not in there, BUT they are. :P If I just try to
> run orted from the command line without any arguments:
>
> *****************
> /gmxtest404 196% orted
> [palpatine:28366] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
> runtime/orte_init.c at line 125
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_base_select failed
> --> Returned value Not found (-13) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [palpatine:28366] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
> orted/orted_main.c at line 323
> /*****************
>
> So, the shell IS finding the file.
True, but you have a deeper problem that has nothing to do with GROMACS
or the shell. You need to get your MPI environment working properly
first. Read its documentation, test and troubleshoot there. Ask its
mailing list if you can't work it out.
> But when I do it not from the script
> anymore (I was already thinking in something on the "it-else-end"
> stack), all mpi tests fail with the following message on mdrun.out file:
>
> **********************
> /orted: Command not found.
> --------------------------------------------------------------------------
> A daemon (pid 27972) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
> /**********************
>
> What is going on?
Your environment inside your shell is different from your environment
inside the gmxtest.pl script. If you're running under a queueing system,
get the environment variables exported to the script. If you're using
the script you were using earlier, then perhaps you need to learn about
setting environment variables under different shells. I don't recall
whether variable names are case-sensitive, but your "set path=(blah)"
simply won't do the job. Only environment variables get exported to
child processes (i.e. the gmxtest.pl you spawn later), so you need
"setenv PATH blah". Under bash, use "export PATH=blah" not "PATH=blah".
Otherwise, change the script to dump the contents of the enviroment PATH
variable (e.g. print STDERR "$env{'PATH'}\n";) to see if those contents
clue you in to what is happening.
> Next thing I think about doing is to execute a full
> command line from one of the tests directly, to see that it works... :( :P
Won't work, because MPI is broken.
Mark
More information about the gromacs.org_gmx-users
mailing list