[gmx-users] number of cores with a large prime factor

Mark Abraham mark.abraham at anu.edu.au
Fri Jan 14 04:26:53 CET 2011


On 01/14/11, chris.neale at utoronto.ca wrote:
> Dear users:
> 
> I received this error message that gromacs 4.5.3 does not allow me to run on 34 cores. I suggest that this gets put in as a warning (so that I can avoid it if I choose and actually run on 34 cores).
> 
> A run on 32 cores -npme -1 choose to use 24 particle-particle nodes and 8 PME nodes. I therefore selected 34 cores and -nmpe 10. I don't see how the large prime factor is relevant when -nmpme is not equal to zero.
> 

The MPMD communication load is much smaller if there's a large common factor in the largest element of both the DD grid and the PME grid. With 24 and 10, the best common factor is 2, and the combination of 4x3x2 (or 8x3x1) and 2x5x1 grids would be poor. I bet that if you could run this job, it would be markedly slower than the 32-core one above. The architecture of the communication hardware is important here, and you do want any intrinsic values (like cores per Infiniband connection) to be a factor of that large common factor. This can have a very large effect on performance - I have a publication due out shortly that demonstrates that on 64 processors of BlueGene/L, npme=0 and npme=32 run faster than any other value (with other parameters optimized so as to produce equivalent error in the PME approximation), and this is because of the communication topology.

> The error message was:
> 
> Fatal error:
> The number of nodes you selected (34) contains a large prime factor 17. In most cases this will lead to bad performance. Choose a number with smaller prime factors or set the decomposition (option -dd) manually.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> 
> Thoughts?
> 

This change arose from this discussion http://redmine.gromacs.org/issues/551. As you can see from the timing numbers I report there, one really does not want large prime factors in either N_pme or N_pp, and thus in the number of nodes (because you want a large non-prime common factor). The GROMACS defaults achieve decent results if you request multiples that reflect your hardware reality, and avoid comparably large prime factors.


Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20110114/319f3057/attachment.html>


More information about the gromacs.org_gmx-users mailing list