[gmx-users] No performance increase with single vs multiple nodes
Mark Abraham
mark.j.abraham at gmail.com
Mon Oct 9 04:45:48 CEST 2017
Hi,
On Sun, Oct 8, 2017 at 2:40 AM Matthew W Hanley <mwhanley at syr.edu> wrote:
> I am running gromacs 2016.3 on CentOS 7.3 with the following command using
> a PBS scheduler:
>
>
> #PBS -N TEST
>
> #PBS -l nodes=1:ppn=32
>
> export OMP_NUM_THREADS=1
>
> mpirun -N 32 mdrun_mpi -deffnm TEST -dlb yes -pin on -nsteps 50000 -cpi
> TEST
>
>
> However, I am seeing no performance increase when using more nodes:
>
> On 32 MPI ranks
> Core t (s) Wall t (s) (%)
> Time: 28307.873 884.621 3200.0
> (ns/day) (hour/ns)
> Performance: 195.340 0.123
>
> On 64 MPI ranks
> Core t (s) Wall t (s) (%)
> Time: 25502.709 398.480 6400.0
> (ns/day) (hour/ns)
> Performance: 216.828 0.111
>
> On 96 MPI ranks
> Core t (s) Wall t (s) (%)
> Time: 51977.705 541.434 9600.0
> (ns/day) (hour/ns)
> Performance: 159.579 0.150
>
> On 128 MPI ranks
> Core t (s) Wall t (s) (%)
> Time: 111576.333 871.690 12800.0
> (ns/day) (hour/ns)
> Performance: 198.238 0.121
>
> ?
>
There's several dozen lines of performance analysis at the end of the log
file, which you need to inspect and compare if you want to start to
understand what is going on :-)
> Doing an strace of the mdrun process shows mostly this:
>
strace is not a profiling tool. That's a bit like trying to understand the
performance of 100m sprinters by counting how often they call their
relatives on the phone. ;-) GROMACS does lots of arithmetic, not lots of
calls to system functions.
Mark
More information about the gromacs.org_gmx-users
mailing list