[gmx-users] MPI scaling (was RE: MPI tips)
mathog at caltech.edu
Wed Feb 1 17:50:21 CET 2006
Mark Abraham wrote:
> David Mathog wrote:
> > Presumably I'm doing something wrong here but so far the
> > gromacs MPI performance has been abysmal.
> > It was suggested that the gmxdemo example was too small so today
> > I tried changing the original -d .5 value used with editconf to
> > -d 2, -d 4, and finally -d 8. Details are:
> It was also suggested that the simulations are too short.
You lost me there. The number of steps was the same, the times went up
and up and up, but the scaling didn't improve all that much. I also
tried (since my last post) adding coulombtype=cut-off to the parameter
files but that made no difference whatsoever. Using the alternative
Scaling formula of:
S = t_1/(N * t_N)
at -d=8 for the 1st mdrun (energy minimization) I measured:
t_1=695, t_20=336 => S=.103 or 10.3%, which is just dreadful
t_1=695, t_4=377 => S=.461 or 46.1% which is better
but still not anywhere near the 94% listed on the link below for
a similar hardware configuration.
> There is
> overhead in setting up the MPI system, as well as in each communication.
> You want to run benchmarks that aren't dominated by this setup time.
Unless the volume being simulated is actually that size, then the
benchmark is appropriate for the task at hand.
> suggest looking on the gromacs web page for the benchmarks section and
> running the benchmark systems you can get from there. Then you will have
> a basis for comparison with the results there, and people here will have
> more confidence that your problem isn't you making an error through
> inexperience with gromacs.
Actually I did download those benchmarks and while they provide
parameters and raw data they don't also provide the command line
parameters used for the runs. So would whoever ran the one on this
labeled "Scaling (100Mb Ethernet)" please send me the script
he/she used to obtain those numbers? Specifically, which MPI was
it and which mpirun parameters were used?
> My interconnects are
> a lot better than 10baseT however.
It's 100baseT but your point is still valid. Since there was no
functional demo_mpi provided and I had to write one myself I'm wondering
if I might not have some mpirun parameter set appropriate for lam-mpi
running with this program.
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the gromacs.org_gmx-users