>Did you try recompiling lam-mpi with the latency optimisations that were 
>suggested (?by David) some time ago on the list?

I certainly did.  I went back, recompiled to lam-6.5.6 using 
-with-tcp-short=524288 versus 64k default, --with-rpi=usysv

For those interested, here are the details for the system along with the 
scaling numbers and details:

         Intel P733 Dual Processor
         100 base Ethernet
         Linux Redhat 7.3
         lam-6.5.6 (tcp-short=524288)
         fftw-2.1.3 (enable-mpi)
         gromacs-3.1.4 (enable-mpi)

MD Simulation:
         7.5 nm cube
         16 molecules, 16 Na, + water
         42,000 atoms
         500 steps
         0.002 ps step
         PME (order=4, 0.12nm fourier spacing)

grompp switches:
         -np #

Initiation of mdrun:
         mpirun C mdrun -s .....
         1 CPU run time approximately 500 sec

Cluster tcp=64k tcp=524k        PME off         PME(6/0.17nm)

1 box/2 CPU     71%             72%             86%             84%
2 box/2 CPU     54%             56%             83%             74%
2 box/3 CPU     40%             41%             73%             62%
2 box/4 CPU     27%             30%             71%             51%

         # changing the tcp-short of lam did improve things slightly, but 
not really that much to justify the right royal pain it was ;-)
         # PME doesn't scale very well, as is noted in the manual etc.
         # PME does scale better when the fourier grid spacing is increased 
and the PME order increased

I am going to see if there is much of a difference for a longer run, to 
reduce the effect of the setup and shut down period that doesn't 
parallelise, as noted by Anton.

