[gmx-users] Identical energies generated in a rerun calculation ... but ...
Mark Abraham
Mark.Abraham at anu.edu.au
Fri Apr 24 09:56:56 CEST 2009
Mark Abraham wrote:
>
> OK I have some confirmation of a possible bug here. Using 4.0.4 to do
> reruns on the same positions-only NPT peptide+water trajectory with the
> same run input file:
>
> a) compiled without MPI, a single-processor rerun worked correctly,
> including "zero" KE and temperature at each frame
>
> b) compiled with MPI, a single-processor run worked correctly, including
> zero KE and temperature, and agreed with a) within machine precision
>
> c) compiled with MPI, a 4-processor run worked incorrectly : an
> approximately-correct temperature and plausible positive KE were
> reported, all PE terms were identical to about machine precision with
> the first step of a) and b), and the reported pressure was different.
>
> Thus it seems that a multi-processor mdrun is not updating the structure
> for subsequent steps in the loop over structures, and/or is getting some
> KE from somewhere that a single-processor calculation is not.
>
> I'll step through c) with a debugger tomorrow.
d) compiled with MPI, a 4-processor run using particle decomposition
worked correctly, agreeing with a).
Further, c) has the *same* plausible positive KE at each step.
From stepping through a run, I think the rerun DD problem arises in
that a rerun loads the data from the rerun trajectory into rerun_fr, and
later copies those into state, and not into state_global. state_global
is initialized to that of the .tpr file (which *has* velocities), which
is used for the DD initialization, and state_global is never
subsequently updated. So, for each rerun step, the same .tpr state gets
propagated, which leads to all the symptoms I describe above. The KE
comes from the velocities in the .tpr file, and is thus constant.
So, a preliminary work-around is to use mdrun -rerun -pd to get particle
decomposition.
I tried to hack a fix for the DD code. It seemed that using
for (i=0; i<state_global->natoms; i++)
copy_rvec(rerun_fr.x[i],state_global.x[i])
before about line 1060 of do_md() in src/kernel/md.c should do the
trick, since with bMasterState set for a rerun, dd_partition_system()
should propagate state_global to the right places. However I got a
segfault in that copy_rvec with i==0, despite state_global.x being
allocated and of the right dimensions according to Totalview's memory
debugger.
I'll file a bugzilla in any case.
Mark
More information about the gromacs.org_gmx-users
mailing list