[gmx-developers] some notes on compiling the current gmx cvs sources on cray xt3
akohlmey at cmm.chem.upenn.edu
Wed May 17 13:16:33 CEST 2006
On Wed, 17 May 2006, David van der Spoel wrote:
DS> Axel Kohlmeyer wrote:
DS> I have fixed the // problems. I don't understand the remark about
DS> compiling serially.
thanks. to compile serially i need to add the following change
(extracted from the patch i sent):
RCS file: /home/gmx/cvs/gmx/src/mdlib/pme.c,v
retrieving revision 1.75
diff -u -r1.75 pme.c
--- src/mdlib/pme.c 16 May 2006 15:18:08 -0000 1.75
+++ src/mdlib/pme.c 16 May 2006 23:50:00 -0000
@@ -1294,7 +1294,9 @@
pme->nodeid = cr->nodeid;
pme->nnodes = cr->nnodes;
pme->mpi_comm = cr->mpi_comm_mygroup;
fprintf(log,"Will do PME sum in reciprocal space.\n");
mpi_comm_mygroup is only defined with --enable-mpi,
note, that the same file now (revision 1.76) also
contains an unresolved cvs conflict.
DS> > as already reported by shawn brown elsewhere, the gcc compiled code
DS> > tends to crash
DS> > in different places (e.g. segfault in add_gbond(), mpi error in
DS> > splitter.c).
DS> This is weird, the add_gbond error might depend on the system studied
this is the DPPC benchmark. it seems to be quite random. i'll try
compiling with lower optimization. the gcc compiler is a gcc 3.3.1.
DS> Your numbers look good for the ring parallellization scheme that we have
DS> used until now. It will quite soon be possible to obtain even better
well, the xt3 has a 3d-torus network and that should have an advantage
with ring schemes (same as SCI dolphin / scali).
DS> scaling using domain decomposition, Berk has been working very hard to
DS> implement it. The DPPC benchmark scales to 32 Opteron cores on my Gbit
DS> network already, so it will be interesting to see whether it will be
DS> even better on the Cray.
yes indeed. using -dd 6 6 2 and 72 nodes with the same input gives:
NODE (s) Real (s) (%)
Time: 102.000 102.000 100.0 1:42
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 481.538 40.828 8.471 2.833
this is now at a point where there is so frequent output, that
increasing the i/o buffers will be needed to reduce the latencies
from the portals (the nodes have no local disk, only access to a
parallel lustre filesystem via an RPC-like scheme which forwards
all i/o to a comparatively small number of i/o nodes.).
the troughput is quite impressive.
Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
Center for Molecular Modeling -- University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
If you make something idiot-proof, the universe creates a better idiot.
More information about the gromacs.org_gmx-developers