Hence, to get a balanced hardware combination (assuming the same input
system and settings), you would need a GPU that's about 2x faster than
the K5000.

Is that a typo?  We used a K4000, with half the number of CUDA cores (768) from what we are proposing (1536)

* The GROMACS non-bonded kernels are compute-bound, so one can compare
roughly performance of two cards of identical compute capability (!)
by looking at the ratio of #multiprocessor*frequency (assuming an
input large enough to reach the peak of the respective GPU), i.e. for
K5000 vs GTX 770 roughly (1085*8)/(706.0*8)

Not quite understanding the numbers:  the advertised GPU frequency for the 770 is 1085 MHz which appears above.  But the maximum clock speed of the K4000 card we have, based on the Nvidia configuration utility, is 549MHz, not 706.  And the value of "8" refers to the dedicated FP64 cores?  Then for the K4000 this is only 4, not 8, I believe.  So we should get a roughly (1085 x 8)/(549 x 4) = 3.9 fold improvement from the GTX 770 (in theory).




