Justin Lemkul
Wed Dec 14 13:57:15 CET 2016

On 12/14/16 6:44 AM, Neha Gandhi wrote:
> Dear list,
> I think the question on gromacs performance on cpu cluster has been raised
> many times in the mailing list. My apologies for reiterating the question.
> I am using a system ~80000 atoms with virtual sites (hence timestep of 4
> fs). The job hasn't completed yet but it seems that the jobs are running
> slower than my experience running with older versions of GROMACS.
> Here is the job script
> #!/bin/bash -l
> #PBS -N sgk
> #PBS -l walltime=24:00:00
> #PBS -l select=6:ncpus=16:mpiprocs=16:mem=20gb
> #PBS -j oe
> export OMP_NUM_THREADS=1
> module purge
> module load gromacs/5.1.2-foss-2016a-hybrid
> mpirun -np 96 gmx_mpi mdrun -v -deffnm npt
> and the output log indicates load imbalance :
> Number of logical cores detected (48) does not match the number reported by
> OpenMP (10).
> Consider setting the launch configuration manually!
> Running on 6 nodes with total 248 logical cores

This says 248 cores but it sounds like you're only trying to use 96.  Perhaps 
your job is fighting with others on those nodes and choking everything.  mdrun 
will try to use all available resources unless properly instructed to do otherwise.

Talk to your sysadmin if you're having performance or usage issues.  That's what 
they're paid to do!

>   Logical cores per node:   40 - 48
> Hardware detected on host cl3n073 (the node of MPI rank 0):
>   CPU info:
>     Vendor: GenuineIntel
>     Brand:  Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
>     Family:  6  model: 63  stepping:  2
>     CPU features: apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm
> mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
> sse3 sse4.1 sse4.2 ssse3 tdt x2apic
>     SIMD instructions most likely to fit this hardware: AVX_256 - AVX2_256
>     SIMD instructions selected at GROMACS compile time: AVX_256
> DD  step 34999  vol min/aver 0.553  load imb.: force 46.2%  pme mesh/force
> 0.506
>            Step           Time         Lambda
>           35000       70.00000        0.00000

35000 steps -> 70 ps means your time step is only 2 fs, not 4.  Check your 
inputs if you expected otherwise.



