[gmx-developers] Gromacs hardware

Brendan Moran annirack at shaw.ca
Mon Apr 25 21:58:47 CEST 2005


Sorry about the lack of subject on my previous post.

David wrote:

> Check the equations in the manual at chapter 4.
>
>Do you think you can achieve much more than 2 GFlop/s on your chip? This
>is the performance I get in the best case on a single 2 GHz Opteron. It
>seems that would be hard to beat even with special hardware, but I'm
>ready to be surprised. As you may (or may not) know, the GROMACS project
>started out as a hardware project, and we succeeded doing a 1/sqrt(x)
>computation in hardware at 5 MHz (in 1990 or so). Then we gave up and
>bought 40 MHz i860 chips that were programmable...
>  
>
You're quite right.  Replicating 2 GFLOPs would be difficult, but I'm 
looking at the use of FPGAs with a 600MHz core.  The benefit is 
parallelism, and preprogramming.  The opteron is going to be doing a lot 
of pushing variables around, and that takes time.  An FPGA based 
processor might be able to handle that portion better, though it will 
never compare with the raw power available from a true CPU.

Check out what the FPGA guy at my work had to say about it below (I 
showed him the pdf in Erik's post).
--
Brendan Moran

-------- Original Message --------
Subject:     Re: Gromacs coprocessor
Date:     Fri, 22 Apr 2005 22:02:08 -0700

On first run-thru, it reminds me a lot of certain graphics problems - a
whole lot of indexing, adding/sub/mults, & variable shuffling.  SQRT is
only toughie to do in h/w (except via lookup).

A DSP chip would likely do real well at this, you know ... tho' there
might still be too much var shuffling for any ordinary, standard DSP
chip (still better than a CISC chip, tho').

But yeah, I suspect an FPGA-based co-proc would do this real quick.
Need to have several ALUs & MACs in parallel, with multi-way
input/output traffic cops (muxes) for the vars & results.  Plus
good/fast external mem & access h/w.

Big improvement would also come from parallelism - many data paths in
parallel.  Lot of time wasted in this C code moving stuff around, one
var at a time.

Be a fair amount of work, prob.  Fun, tho'.  Instantiate an ALU/MAC for
each data path, point their inputs/outputs directly to the appropriate
chunks of mem, auto indexing (counters), etc.



More information about the gromacs.org_gmx-developers mailing list