[gmx-users] Comparing Gromacs versions

Fri May 17 16:01:55 CEST 2013

On Fri, May 17, 2013 at 2:48 PM, Djurre de Jong-Bruinink
<djurredejong at yahoo.com> wrote:
>
>
>>The answer is in the log files, in particular the performance summary
>>should indicate where is the performance difference. If you post your
>>log files somewhere we can probably give further tips on optimizing
>>your run configurations.
>
>
> I put the log files for 72 CPUs, using GMX455, GMX461+group and GMX461+verlet here:
> http://md.chem.rug.nl/~djurre/logs/N6_gmx455.log
> http://md.chem.rug.nl/~djurre/logs/N6_gmx461_group.log
> http://md.chem.rug.nl/~djurre/logs/N6_gmx461_verlet.log

That tells much more.

> It would be great if you could point out some  possible optimizations.

Here you go:
- You seem to be using 2 fs time-steps so you don't need to constrain
all bonds, constraining only h-bonds is enough. This will reduce the
cell size requirement posed by LINCS and will allow further
decomposition. Additionally, you can also tweak the LINCS order and
iteration.
- With the Verlet scheme you can use OpenMP parallelization to reduce
the pressure on domain-decomposition, e.g. by using 2 OpenMP threads
(at least for PP) you'd need only 24 domains instead of 48. OpenMP
parallelization is not very efficient on the old-ish AMD processors
you are using, but 2 threads/MPI ranks should still help at very high
parallelization (<200 atoms/core).
- On the AMD Istambul (K10) processors that you are using gcc
generates rather poor non-bonded kernel code. icc will make the
non-bondeds run 10-20% faster.
- With Verlet scheme you can safely increase nstlist to higher values
so already at 72 cores and especially at higher core count 12,15, or
even 20 might give better performance.
- Your 4.6 group scheme run shows large PP-PME imbalance, try
increasing the number of PME ranks!

>
>
>>Note that with such a small system the scaling with the group scheme
>>surely becomes limited by imbalance and probably it won't scale much
>>further than 72 cores. At the same time, simulations with the verlet
>>scheme have shown scaling to below 100 atoms/core.
>
> I tried running on 84 cpus (56PP cores=400 atoms/PP core), but I did get an domain decomposition error. Maybe I could optimize -rcon and -dds further, however although the scaling to more CPUs is better with the verlet scheme, I think you will never win:  with 72 CPUs Verlet is almost as fast as with group at 60 CPUs, however compared to 24 cpus the scaling per CPU is already down to 60%.
>
> But as Mark Abraham mentioned, it might be that my system is just to small to get the advantage of scaling that will be there in larger systems.

As I mentioned before, you *should* be able to use 2x more cores (or
perhaps even more), but of course the parallel efficiency will
decrease.

Cheers,
--
Szilard

>
>
> Groetnis,
> Djurre
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists