[gmx-developers] Slow down caused by collective parallel operation

David van der Spoel spoel at xray.bmc.uu.se
Thu Mar 5 23:04:12 CET 2015

On 2015-03-05 22:41, Harry (Yicun) Ni wrote:
> Hi All,
> I am developing a new water model and implementing it in gromacs 4.5.5.
> The model requires an "Allreduce" operation on a 3*natoms sized real
> value array when calculating forces, and I am using "gmx_sumd" function
> to do this communication.
> Then, I test my model on a 500-mol water box. When I use 8 processors on
> 1 node, I can have the performance that I am expecting. However, if I
> use 24 processors on 2 nodes, usually, but not always, I got a dramatic
> performance slow down where the actual time of simulation can be slower
> than it on 8 processors. I am wondering if someone can give me some
> suggestions on this issue. Thank you very much.
> Yicun Ni
We typically recommend at least 1000 atoms per core, unless you have an 
extremely fast interconnect, then 500 atoms per core can scale 
reasonably. With 8 cores you are at 200 atoms per core already. Not much 
you can do. It could be the gmx_sumd is not optimal though.

David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
spoel at xray.bmc.uu.se    http://folding.bmc.uu.se

More information about the gromacs.org_gmx-developers mailing list