Hi! When I run gromacs-4.6.1 with k20. I meet a question. I have 6 nodes.And each node has one K20.And I use one process on one node with one gpu. But the test result shows that the runtime of one node is less than that of six nodes.Is the scalability of GPU not good? Thanks!