[gmx-users] desort
Chris Neale
chris.neale at utoronto.ca
Wed Jan 24 23:49:05 CET 2007
I have developed a methodology to desort based on the script that I
previously mentioned. The speed of desorting has been improved. For a
system of 100K atoms the setup and processing for mdrun_mpi (grompp's
trjconv's g_desort's etc.) took me 30sec. For a system of 750K atoms the
time was 90sec.
Benifits:
When running in 200ps segments on NP=4 in single precision I get a
speedup of about 1.4 times vs without -shuffle and speedup of 1.15 times
vs. with shuffle but without sort. It is only with sort that my scaling
to NP=4 or NP=16 matches the gromacs benchmarks.
Testing:
1. I have run 20 x 20ps mdrun_mpi iteratively using -shuffle -sort and
this g_desort program and the system "looks normal".. no quantitative
test though.
2. I have compared the -deshuf.ndx file produced by grompp to a g_desort
output .ndx file (both produced using -shuffle but not -sort) and they
match exactly.
I have not posted this tool to the uploads section since I don't want
anybody to run into unexpected problems in case it is incorrect since it
affects the mdrun directly.
I welcome any suggestions for ways to test if this is correct. Also, if
anybody has some long-time sorted trajectories out there then they could
use this to recover individual positions and "follow their favourite
water molecule". If anybody does do this, please post something about
your success.
This was written from template.c and compilation of this program is done
similarly.
Before using this, please read the "static char *desc[] = {" completely.
Especially for the warnings that discuss what number your index files
begin with and the exact implementation that is a bit of a hassle based
on the order that grompp does things but it can all be scripted.
Chris Neale.
/*
* $Id: template.c,v 1.4 2001/07/23 15:28:29 lindahl Exp $
*
* This source code is part of
*
* G R O M A C S
*
* GROningen MAchine for Chemical Simulations
*
* VERSION 3.0
*
* Copyright (c) 1991-2001
* BIOSON Research Institute, Dept. of Biophysical Chemistry
* University of Groningen, The Netherlands
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* If you want to redistribute modifications, please consider that
* scientific software is very special. Version control is crucial -
* bugs must be traceable. We will be happy to consider code for
* inclusion in the official distribution, but derived work must not
* be called official GROMACS. Details are found in the README & COPYING
* files - if they are missing, get the official version at www.gromacs.org.
*
* To help us fund GROMACS development, we humbly ask that you cite
* the papers on the package - you can find them in the top README file.
*
* Do check out http://www.gromacs.org , or mail us at gromacs at gromacs.org .
*
* And Hey:
* Gyas ROwers Mature At Cryogenic Speed
*/
/* This line is only for CVS version info */
static char *SRCID_template_c = "$Id: template.c,v 1.4 2001/07/23
15:28:29 lindahl Exp $";
#include "statutil.h"
#include "typedefs.h"
#include "smalloc.h"
#include "vec.h"
#include "copyrite.h"
#include "statutil.h"
#include "tpxio.h"
#include "math.h"
int main(int argc,char *argv[])
{
static char *desc[] = {
"g_desort takes two coordinate files as input and outputs a .ndx
file that ",
"can be used to properly sort/shuffle or desort/deshuffle. ",
"The -f option takes two coordinate files as input and the order is
important: ",
"the first file is the desired order and the second file is the
current order. ",
"To desort coordinates g_desort -f unsorted.gro sorted.gro and to
resort ",
"coordinates g_desort -f sorted.gro unsorted.gro. In order to use
the -sort option ",
"of grompp you will only need to desort your trajectory and g_desort
makes this trivial. ",
"However, the benifit of sorting ",
"decreases over time as the particles move away from their sorted
positions. Therefore ",
"regular re-sorting is desirable. The usage section below describes
how to do this. ",
"The basic idea is to pre-sort the .trr file for loading into grompp
based on a resort ",
".ndx file generated by g_desort after a preliminary grompp without
the .trr. An additional ",
"sorting step is added to ensure that the grompp sorting did not
change based on the loaded i",
".trr file. The mdrun in then completed and desorted prior to
starting again. \n",
"USAGE: \n",
"1. grompp -shuffle -sort -c original.gro -o sorted_temp.tpr \n",
"2. editconf -f sorted_temp.tpr -o sorted_temp.gro \n",
"3. g_desort -f sorted_temp.gro original.gro -o resort_temp.ndx \n",
"4. trjconv -f original.trr -n resort_temp.ndx -o sorted.trr \n",
"5. grompp -shuffle -sort -c original.gro -t sorted.trr -o
sorted.tpr \n",
"6. editconf -f sorted.tpr -o sorted.gro \n",
"7. g_desort -f sorted.gro original.gro -o resort.ndx \n",
"8. look=`diff -q resort_temp.ndx resort.ndx`; if [ -n ${look} ];
then exit; fi \n",
"9. g_desort -f original.gro sorted.gro -o desort.ndx \n",
"10. mdrun_mpi -s sorted.tpr \n",
"11a. trjconv -f output.xtc -n desort.ndx -o output_desorted.xtc \n",
"11b. trjconv -f output.trr -n desort.ndx -o output_desorted.trr \n",
"11c. trjconv -f output.gro -n desort.ndx -o output_desorted.gro \n",
"12. Use these output_desorted files as input to step 1 as the
original.xtc/trr/gro files \n",
" - the .edr file does not need to be desorted \n",
" - g_desort takes care of sorting and shuffling at the same
time \n",
" - Do not use the -deshuf deshuf.ndx file from grompp \n",
"WARNINGS:\n",
" - This is a beta version. It might not work properly. Don't trust
it \n",
" - Expecting your index files to start from the number 1. If they
start from the number 0 then ",
"you must modify the fprintf output lines to remove the +1 from the
integer that is output \n",
"BUGS: \n",
"The determination of the expected number of particles per grid is
sub-optimal ",
"and if the ratio of system volume over grid points is too small
then the program will not allocate ",
"enough memory. This is detected, but it should be fixed (perhaps by
actually counting before allocating\n",
};
static char *outputfnam;
static real rDISTCUT=0.005;
static int iNUMBINS=10;
/* Extra arguments - but note how you always get the begin/end
* options when running the program, without mentioning them here!
*/
//
t_pargs pa[] = {
{ "-c", FALSE, etREAL, {&rDISTCUT},
"Cut-off distance for atom identification."
},
{ "-n", FALSE, etINT, {&iNUMBINS},
"Number of grids in each dim (>=1)."
}
};
t_topology top;
char title[STRLEN];
t_trxframe fra;
t_trxframe frb;
rvec *xtop;
matrix box;
int status;
int flags = TRX_READ_X;
char **fnms;
int nfile_in,ata,atb,found_flag;
real dab;
real
max[3]={0.0,0.0,0.0},min[3]={0.0,0.0,0.0},maxminusmin[3]={0.0,0.0,0.0},minovermaxminusmin[3]={0.0,0.0,0.0};
int **bxyz; //[natoms][3]
int twobxyz[3]={0.0,0.0,0.0};
int ****twobinlist; //[xdim][ydim][zdim][list] *dim is the cells
[list=0] is the number in that list
int maxcellnum;
int i,j,k;
FILE *fout;
t_filenm fnm[] = {
{ efTRX, "-f", NULL, ffRDMULT }, /* and this for the trajectory */
{ efNDX, "-o", "deshuffledesort", ffWRITE } //output
};
#define NFILE asize(fnm)
CopyRight(stderr,argv[0]);
/* This is the routine responsible for adding default options,
* calling the X/motif interface, etc. */
parse_common_args(&argc,argv,PCA_CAN_TIME | PCA_CAN_VIEW,
NFILE,fnm,asize(pa),pa,asize(desc),desc,0,NULL);
nfile_in = opt2fns(&fnms,"-f",NFILE,fnm);
if (nfile_in==0)
gmx_fatal(FARGS,"");
if (nfile_in!=2)
gmx_fatal(FARGS,"There must be exactly 2 input files!");
outputfnam=opt2fn("-o",NFILE,fnm);
if(iNUMBINS<=0)
gmx_fatal(FARGS,"The number of bins must be greater than or equal to
1\n");
/* The first time we read data is a little special */
read_first_frame(&status,fnms[0],&fra,flags);
read_first_frame(&status,fnms[1],&frb,flags);
if(fra.natoms!=frb.natoms){
printf("Error: unequal number of atoms (%d and
%d)\n",fra.natoms,frb.natoms);
exit(1);
}
printf("WARNING: This is a beta version. It might not work properly.
Don't trust it\n");
fout=fopen(outputfnam,"w");
fprintf(fout,"[ DeShuffle ]");
//Could use a smarter determination of maxcellnum based on loaded
dimensions and on knowing how many grids unfilled
if(iNUMBINS<=3)
maxcellnum=fra.natoms+1;
else
maxcellnum=2*fra.natoms/((iNUMBINS-2)*(iNUMBINS-2)*(iNUMBINS-2)) + 1;
bxyz=(int **)malloc((fra.natoms+1)*sizeof(int *));
if(bxyz==NULL)exit(1);
for(i=0; i<=fra.natoms; ++i){
bxyz[i]=(int *)malloc(sizeof(int[3]));
if(bxyz[i]==NULL)exit(1);
}
twobinlist=(int ****)malloc(iNUMBINS*sizeof(int ***));
if(twobinlist==NULL)exit(1);
for(i=0; i<iNUMBINS; ++i){
twobinlist[i]=(int ***)malloc(iNUMBINS*sizeof(int **));
if(twobinlist[i]==NULL)exit(1);
for(j=0; j<iNUMBINS; ++j){
twobinlist[i][j]=(int **)malloc(iNUMBINS*sizeof(int *));
if(twobinlist[i][j]==NULL)exit(1);
for(k=0; k<iNUMBINS; ++k){
twobinlist[i][j][k]=(int *)malloc(maxcellnum*sizeof(int));
if(twobinlist[i][j][k]==NULL)exit(1);
twobinlist[i][j][k][0]=0;
}
}
}
//printf("About to divide data\n");fflush(stdout);
//Divide each dimension into the boxes
for(ata=0; ata<fra.natoms; ++ata){
for(i=0; i<=2; ++i){
if(fra.x[ata][i]>max[i])max[i]=fra.x[ata][i];
if(fra.x[ata][i]<min[i])min[i]=fra.x[ata][i];
}
}
for(i=0; i<=2; ++i){
maxminusmin[i]=(max[i]-min[i])/(real)iNUMBINS;
minovermaxminusmin[i]=min[i]/maxminusmin[i];
}
//We only need to know which bin the first one is in
for(ata=0; ata<fra.natoms; ++ata){
for(i=0; i<=2; ++i){
bxyz[ata][i]=(int)floor(fra.x[ata][i]/maxminusmin[i] -
minovermaxminusmin[i]);
bxyz[ata][i]=(bxyz[ata][i]<0)?0:(bxyz[ata][i]>(iNUMBINS-1))?(iNUMBINS-1):bxyz[ata][i];
}
}
//For the second list, we need to group them by which cell
for(atb=0; atb<frb.natoms; ++atb){
for(i=0; i<=2; ++i){
twobxyz[i]=(int)floor(frb.x[atb][i]/maxminusmin[i] -
minovermaxminusmin[i]);
twobxyz[i]=(twobxyz[i]<0)?0:(twobxyz[i]>(iNUMBINS-1))?(iNUMBINS-1):twobxyz[i];
}
//printf("%d in %d,%d,%d as %d
entry\n",atb,twobxyz[XX],twobxyz[YY],twobxyz[ZZ],twobinlist[twobxyz[XX]][twobxyz[YY]][twobxyz[ZZ]][0]);fflush(stdout);
if(++twobinlist[twobxyz[XX]][twobxyz[YY]][twobxyz[ZZ]][0]>maxcellnum){
printf("Error: too many atoms in one of the cells (exceeded max of
%d). Try reducing the number of grids that you are using\n",maxcellnum);
exit(1);
}
twobinlist[twobxyz[XX]][twobxyz[YY]][twobxyz[ZZ]][twobinlist[twobxyz[XX]][twobxyz[YY]][twobxyz[ZZ]][0]]=atb;
}
//printf("About to find matches\n");fflush(stdout);
/* IMPORTANT NOTE:
* Printed values must be +1 to value
* Also, rDISTCUT is squared here to avoid sqrt of dist
*/
rDISTCUT*=rDISTCUT;
for(ata=0; ata<fra.natoms; ++ata){
if(div(ata,10).rem==0)fprintf(fout,"\n");
found_flag=0;
//try first to see if it is in the same box
for(atb=1;
atb<=twobinlist[bxyz[ata][XX]][bxyz[ata][YY]][bxyz[ata][ZZ]][0]; ++atb){
dab=distance2(fra.x[ata],frb.x[twobinlist[bxyz[ata][XX]][bxyz[ata][YY]][bxyz[ata][ZZ]][atb]]);
if(dab<rDISTCUT){
//printf("%d\t-ON GRID-\t%d %d
%d\n",twobinlist[bxyz[ata][XX]][bxyz[ata][YY]][bxyz[ata][ZZ]][atb]+1,bxyz[ata][XX],bxyz[ata][YY],bxyz[ata][ZZ]);
fprintf(fout,"
%d",twobinlist[bxyz[ata][XX]][bxyz[ata][YY]][bxyz[ata][ZZ]][atb]+1);
found_flag=1;
break;
}
}
if(found_flag)continue;
//Look through the entire list
for(atb=0; atb<frb.natoms; ++atb){
dab=distance2(fra.x[ata],frb.x[atb]);
if(dab<rDISTCUT){
fprintf(fout," %d",atb+1);
found_flag=1;
break;
}
}
if(!found_flag){
printf("Error: could not find pair for %d\n",ata);
exit(1);
}
}
fprintf(fout,"\n");
fclose(fout);
thanx(stderr);
return 0;
}
More information about the gromacs.org_gmx-users
mailing list