I am benchmarking a new cluster and also learning the ropes of GROMACS. I
am using the dppc benchmark from

On the first cluster, which contains 8 processors per node, I have the
scaling one might expect

# n ranks   [performance   (ns/day)    (hour/ns) ]
1        1.914       12.540
8         1.971       12.176
16         3.855        6.225
32         7.388        3.249
64         13.451        1.784
128     23.399        1.026

on the second cluster, which contains 16 processors per node and uses
hyperthreading, I see good scaling until 128 ranks
#n ranks    [Performance: (ns/day)    (hour/ns)]
1          2.042       11.752
32        9.925       2.418
64        18.041      1.33

at 128 MPI threads (64 cores), I get many errors on the second cluster. The
errors are these following four types:

-"Step 1860, time 3.72 (ps)  LINCS WARNING
relative constraint deviation after LINCS:
rms 0.575613, max 15.930250 (between atoms 49307 and 49308)
bonds that rotated more than 30 degrees:"

-"WARNING: Listed nonbonded interaction between particles 49304 and 49307
at distance 2.961 which is larger than the table limit 2.890 nm."

-"step 1912: Water molecule starting at atom 80043 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate."

-"Step 1920:
Atom 49163 moved more than the distance allowed by the domain decomposition
(0.778291) in direction Y
distance out of cell -710761.250000
New coordinates: 1033813.500 -710744.250    2.361
Old cell boundaries in direction Y:   16.980   18.338
New cell boundaries in direction Y:   16.994   18.338

Fatal error:
An atom moved too far between two domain decomposition steps
This usually means that your system is not well equilibrated"

Of course the simulation crashes after the fatal error.

The input files are identical for the simulations on both clusters. Both of
the .log files report "Domain decomposition grid 8 x 16 x 1"

What could be causing the errors on the second machine and not on the

I can provide any other information that would be useful in solving this
problem... just let me know.

Thank you in advance for any help on this.

--Vincent Ustach
  University of California, Davis

