[gmx-users] On what scale will simulation with PME-dedicated nodes perform better?

Fri Sep 19 05:35:35 CEST 2014

Hi all,

I run gromacs 4.6 on 5 nodes(each has 16 CPU cores and 2 Nvidia K20m) 
and 4 nodes in the following ways:

5 nodes:
1. Each node has 8 MPI processes, and use one node as PME-dedicated node
2. Each node has 8 MPI processes, and use two nodes as PME-dedicated nodes
3. Each node has 4 MPI processes, and use one node as PME-dedicated node

In these settings, the log files complain that PME nodes have more work 
to do than PP nodes, and the average imbalance is 20% - 40%.

4nodes:
Each node has 8 MPI processes, and there is no PME-dedicated node
In the log file, the PME mesh wall time is about the half compared the 
settings above. My guess is that the scaling of my run is small so 
PME-dedicated nodes won't do any good.
So, on what condition should I set PME nodes manually?