[gmx-users] Having some lam issues

Dallas B. Warren Dallas.Warren at vcp.monash.edu.au
Wed May 18 06:36:54 CEST 2005


I have had GROMACS working fine in the past.

Recently I upgraded from RHEL 3 to RHEL 4.  lam-7.0.6-5 comes with that,
so just compiled everything on top of that.  fftw-2.1.3 compiled and
installed fine from source, no errors etc.  Then compiled GROMACS the
from CVS.  That proceeds without errors as well.

Time to fire up lamd ....

##############
<morph /home/dallas> lamboot -v hostfile
LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
n-1<26972> ssi:boot:base:linear: booting n0 (130.194.217.105)
n-1<26972> ssi:boot:base:linear: finished
##############

where the hostfile constains (dual CPU) ....

##############
localhost
localhost
##############

So that seems OK, though I am a bit stumped why it only has n0, not n1
as well since it is a dual processor machine.  Anyway, then when attempt
to run mdrun it doesn't detect that lamd is operating ....

##############
<morph /home/dallas> mdrun
------------------------------------------------------------------------
-----
It seems that there is no lamd running on the host .
This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for MPI programs to run
(the MPI program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment.  See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
------------------------------------------------------------------------
-----
##############

However, if you look at the System Monitor, lamd is running away there
as one of the processes.  Any suggestions on what the issue actually is?
Is it because the lam is too recent, and GROMACS doesn't recognise parts
about it?  Below here is the "lamboot -v -d hostfile" output if that
might provide some insight.  But from what I can tell it does all that
it is meant to.

##############
<morph /home/dallas> lamboot -v -d hostfile
n-1<18706> ssi:boot: Opening
n-1<18706> ssi:boot: opening module globus
n-1<18706> ssi:boot: initializing module globus
n-1<18706> ssi:boot:globus: globus-job-run not found, globus boot will
not run
n-1<18706> ssi:boot: module not available: globus
n-1<18706> ssi:boot: opening module rsh
n-1<18706> ssi:boot: initializing module rsh
n-1<18706> ssi:boot:rsh: module initializing
n-1<18706> ssi:boot:rsh:agent: /usr/bin/rsh
n-1<18706> ssi:boot:rsh:username: <same>
n-1<18706> ssi:boot:rsh:verbose: 1000
n-1<18706> ssi:boot:rsh:algorithm: linear
n-1<18706> ssi:boot:rsh:priority: 10
n-1<18706> ssi:boot: module available: rsh, priority: 10
n-1<18706> ssi:boot: finalizing module globus
n-1<18706> ssi:boot:globus: finalizing
n-1<18706> ssi:boot: closing module globus
n-1<18706> ssi:boot: Selected boot module rsh
LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
n-1<18706> ssi:boot:base: looking for boot schema in following
directories:
n-1<18706> ssi:boot:base:   <current directory>
n-1<18706> ssi:boot:base:   $TROLLIUSHOME/etc
n-1<18706> ssi:boot:base:   $LAMHOME/etc
n-1<18706> ssi:boot:base:   /etc/lam
n-1<18706> ssi:boot:base: looking for boot schema file:
n-1<18706> ssi:boot:base:   hostfile
n-1<18706> ssi:boot:base: found boot schema: hostfile
n-1<18706> ssi:boot:rsh: found the following hosts:
n-1<18706> ssi:boot:rsh:   n0 localhost (cpu=2)
n-1<18706> ssi:boot:rsh: resolved hosts:
n-1<18706> ssi:boot:rsh:   n0 localhost --> 127.0.0.1 (origin)
n-1<18706> ssi:boot:rsh: starting RTE procs
n-1<18706> ssi:boot:base:linear: starting
n-1<18706> ssi:boot:base:server: opening server TCP socket
n-1<18706> ssi:boot:base:server: opened port 32918
n-1<18706> ssi:boot:base:linear: booting n0 (localhost)
n-1<18706> ssi:boot:rsh: starting lamd on (localhost)
n-1<18706> ssi:boot:rsh: starting on n0 (localhost): hboot -t -c
lam-conf.lamd -d -v -I -H 127.0.0.1 -P 32918 -n 0 -o 0
n-1<18706> ssi:boot:rsh: launching locally
hboot: performing tkill
hboot: tkill -d
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname back:
/tmp/lam-dallas at morph.vcp.monash.edu.au/lam-killfile
tkill: removing socket file ...
tkill: socket file:
/tmp/lam-dallas at morph.vcp.monash.edu.au/lam-kernel-socketd
tkill: removing IO daemon socket file ...
tkill: IO daemon socket file:
/tmp/lam-dallas at morph.vcp.monash.edu.au/lam-io-socket
tkill: f_kill = "/tmp/lam-dallas at morph.vcp.monash.edu.hboot: fork
/usr/bin/lamd
hboot: attempting to execute
[1]  18709 lamd -H 127.0.0.1 -P 32918 -n 0 -o 0 -d
n-1<18706> ssi:boot:rsh: successfully launched on n0 (localhost)
n-1<18706> ssi:boot:base:server: expecting connection from finite list
n-1<18709> ssi:boot: Opening
n-1<18709> ssi:boot: opening module globus
n-1<18709> ssi:boot: initializing module globus
n-1<18709> ssi:boot:globus: globus-job-run not found, globus boot will
not run
n-1<18709> ssi:boot: module not available: globus
n-1<18709> ssi:boot: opening module rsh
n-1<18709> ssi:boot: initializing module rsh
n-1<18709> ssi:boot:rsh: module initializing
n-1<18709> ssi:boot:rsh:agent: /usr/bin/rsh
n-1<18709> ssi:boot:rsh:username: <same>
n-1<18709> ssi:boot:rsh:verbose: 1000
n-1<18709> ssi:boot:rsh:algorithm: linear
n-1<18709> ssi:boot:rsh:priority: 10
n-1<18709> ssi:boot: module available: rsh, priority: 10
n-1<18709> ssi:boot: finalizing module globus
n-1<18709> ssi:boot:globus: finalizing
n-1<18709> ssi:boot: closing module globus
n-1<18709> ssi:boot: Selected boot module rsh
n-1<18706> ssi:boot:base:server: got connection from 127.0.0.1
n-1<18706> ssi:boot:base:server: this connection is expected (n0)
n-1<18706> ssi:boot:base:server: remote lamd is at 127.0.0.1:32793
n-1<18706> ssi:boot:base:server: closing server socket
n-1<18706> ssi:boot:base:server: connecting to lamd at 127.0.0.1:32919
n-1<18706> ssi:boot:base:server: connected
n-1<18706> ssi:boot:base:server: sending number of links (1)
n-1<18706> ssi:boot:base:server: sending info: n0 (localhost)
n-1<18706> ssi:boot:base:server: finished sending
n-1<18706> ssi:boot:base:server: disconnected from 127.0.0.1:32919
n-1<18706> ssi:boot:base:linear: finished
n-1<18706> ssi:boot:rsh: all RTE procs started
n-1<18706> ssi:boot:rsh: finalizing
n-1<18706> ssi:boot: Closing
n-1<18709> ssi:boot:rsh: finalizing
n-1<18709> ssi:boot: Closing
au/lam-killfile"
tkill: killing LAM...
tkill: killing PID (SIGHUP) 18675 ...
tkill: killed
tkill: all finished
hboot: booting...
##############

Catch ya,

Dr. Dallas Warren
Lecturer
Department of Pharmaceutical Biology and Pharmacology
Victorian College of Pharmacy, Monash University
381 Royal Parade, Parkville VIC 3010
dallas.warren at vcp.monash.edu.au
+61 3 9903 9073
---------------------------------
When the only tool you own is a hammer, every problem begins to resemble
a nail.



More information about the gromacs.org_gmx-users mailing list