FW: [gmx-users] One more broken .trr file

Justin A. Lemkul jalemkul at vt.edu
Mon Mar 23 18:44:16 CET 2009

Sarah Witzke wrote:
> Dear gromacs users,
> First, Justin, thank you for your reply!
> Second, I have a question regarding how to use a checkpoint file to rerun my broken .trr file (dmpclim1-870.trr). As I previously described my I have divided my simulation into smaller simulations, each of 200 ps duration. I did that in order not to loose too much data if the simulation should crash. I have further divided my simulations into five folders - each folder consists of a little over 200 small .trr files (and the corresponding .gro, .log, .gro, .tpr, and .out files to each of theses small .trr files) - this division is because of a limit of max. 200 hours simulation time per job on the cluster I'm using. Each time the 200 hours have been used, a new folder is created from where the simulation is continued. 

I would think dealing with 200 files would be a major headache :)  Checkpointing 
makes such practice really obsolete.  If the system goes down and your run 
crashes, you never lose more than -cpt amount of time (default of 15 minutes). 
Otherwise, specifying nstxout, etc might be your friend :)

> In each of these five folder I only found these two .cpt files: "state.cpt" and "state_prev.cpt".
> My commands are:
> tpbconv -f dmpclim1-XX.trr -s dmpclim1-XX.tpr -e dmpclim1-XX.edr -extend 200 dmpclim1-YY.tpr
> mdrun_mpi -np 4 -v -s dmpclim1-YY.tpr -o dmpclim1-YY.trr -c dmpclim1-YY.gro -e dmpclim1-YY.edr -g dmpclim1-YY.log >& dmpclim1-YY.out 

Using tpbconv in this way is also obsolete and introduces small (probably 
negligible) errors.  To get a binary identical continuation, you need to make 
use of the checkpoint file:


> I guess that since a new checkpoint file is written every 15 minutes that it will overwrite the previous one. Is that correct? It seems unfortunate to me that it does not make new .cpt files for each small .trr file as it is done for e.g. .log files (naming them something like "#state.cpt.1# and so on). If I have understood it correctly I'm not able to use my checkpoint file, because my simulation continued without errors thus overwriting the needed .cpt files several times. To learn from my mistakes: Next time I do simulations will an option like "-cpo dmpclim1-YY.cpt" create a checkpoint file for each small .trr file?

That would just clog up disk space, really.  If your simulation has proceeded 
from the previous checkpoint with no problem, then really all that *should* be 
necessary in most cases is the most recent checkpoint.

I think your problem likely stems from a file system blip.  I've experienced 
similar behavior when our NFS server acts up, and an incomplete frame is 
written, so the Gromacs tools detect massive coordinates/velocities/forces or 
whatever when processing the output.


> Best,
> Sarah
> -----Original Message-----
> From: gmx-users-bounces at gromacs.org on behalf of Justin A. Lemkul
> Sent: Sat 21-03-2009 13:20
> To: Discussion list for GROMACS users
> Subject: Re: [gmx-users] One more broken .trr file
> Sarah Witzke wrote:
>> Dear gromacs users,
>> I would very much appreciate it if anyone could give me an advice on the following situation:
>> I have run a simulation of a small molecule diffusion into a lipid membrane (gromacs version 4.0). The simulation was run for ~220 ns and stored in small individual .trr files each of ~0.2 ns (giving a total of 1098 small .trr files). There were no errors or otherwise "suspicious" behavior during the simulation.
>> After the simulation I concatenated all the small .trr files into one big .trr file (version 4.0.2 to correspond with other simulations):
>> trjcat -f *.trr -o dmpclim1-all.trr
>> trjcat gave no error message, the last line output to the screen was:
>> "last frame written was 219600.015625 ps"
>> After the concatenation I checked the big .trr file with gmxcheck:
>> gmxcheck -f dmpclim1-all.trr
>> The result was:
>> Checking file dmpclim1-all.trr
>> trn version: GMX_trn_file (single precision)
>> Reading frame       0 time    0.000
>> # Atoms  35508
>> Reading frame   17000 time 170000.016   Warning at frame 17379: coordinates for atom 10917 are large (-2.99061e+19)
>> Warning at frame 17379: coordinates for atom 10921 are large (1.42767e+31)
>> Warning at frame 17379: coordinates for atom 10925 are large (-1.29194e+13)
>> Warning at frame 17379: coordinates for atom 10925 are large (1.51714e+34)
>> Reading frame   21000 time 210000.016
>> Item        #frames Timestep (ps)
>> Step         21961    10
>> Time         21961    10
>> Lambda       21961    10
>> Coords       21961    10
>> Velocities   21961    10
>> Forces           0
>> Box          21961    10
>> Frame 17379 is located in the small .trr file number 870. .trr file 870 consists of 22 frames and the error is in frame 20.
>> Looking at dmpclim1-870.trr in VMD reveals that two water molecules are far, far away (as noted by gmxcheck) in frame 20. Both in frame 19 and in frame 21 the two waters are placed nicely in the box.
>> The dmpclim1-870.log and the screen output from 870 are both normal (i.e. they look similar to all the other steps), so my guess is that something happened during writing to file?
>> I remember a similar problem posted very recently:  http://www.gromacs.org/component/option,com_wrapper/Itemid,165/
>> Reading these emails I understand that there is no way to delete just a single frame - is that correct?
> When posting links, right-click the frame and open it in a new window/tab.  Then 
> you will have the link that actually points to the message you found.  This link 
> is just the search page :)
>> I have thought about two possible options for me now:
>> 1) Use the suggestion given by Justin Lemkul in the email mentioned:
>> trjconv -f dmpclim1-870.trr -b 0 -e 19 -o xxx.trr
>> This is guess would make me loose 3 frames corresponding to (0.2 ns/22)*3 = 0.027 ns. It's not a problem to have 0.027 ns less of simulation, but will it affect later on when I concatenate the small .trr files, convert them to an .xtc file, and then use that to calculate e.g. area/lipid or membrane thickness? Will there be a time-mismatch? 
> Yes, you will likely get complaints from all the Gromacs tools in such a case. 
> The other option is to uniformly cut out frames from all your .trr files 
> (trjconv -skip), such that the bad frame would never appear, and you would have 
> uniformly-spaced frames in all of your .trr files.  That may be sacrificing 
> quite a bit of data, however.
>> 2) Redo step 870. I'm able to redo step 870 quite easily, but what will then happen when I try to concatenate all the small .trr files? I fear that the "old" -870.trr wouldn't be exactly identical to the "new" -870.trr (due to round-off) and that this would make a mismatch with -871.trr?
> If you have a checkpoint file, you should get a binary identical continuation.
> -Justin
>> I'm very sorry to ask this kind of question again, but I hope you'll bear with me and have the patience to help me!
>> Best regards,
>> Sarah    
>> _______________________________________________
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://www.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at http://www.gromacs.org/search before posting!
>> Please don't post (un)subscribe requests to the list. Use the 
>> www interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php


Justin A. Lemkul
Graduate Research Assistant
ICTAS Doctoral Scholar
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080


More information about the gromacs.org_gmx-users mailing list