[gmx-users] Failed to lock .log. Already running simulation?

chris.neale at utoronto.ca chris.neale at utoronto.ca
Fri Dec 2 04:07:04 CET 2011


Dear Users:

I have 50 simulations that are all the same, except with different  
random seeds for velocities. All were running fine for 24 hours. I  
canceled the running jobs and resubmitted them as part of beta testing  
a new cluster. All 50 started. I then canceled one of these jobs soon  
after starting it and then started it again pretty quickly (possibly  
too quickly). This restart now gave me the error:

Fatal error:
Failed to lock: continue.log. Already running simulation?
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

I found this post about this possibly being related to the Lustre filesystem:
http://lists.gromacs.org/pipermail/gmx-users/2010-November/056173.html

But I am not sure how to figure out if that is being used. Here is the  
output from mount:
[nealechr at ip13-mp2 50]$ mount
/dev/mapper/hddvg-root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md0 on /boot type ext4 (rw)
/dev/mapper/hddvg-home on /home type ext4 (rw,usrquota,grpquota)
/dev/md2 on /ltmp type ext4 (rw)
/dev/mapper/hddvg-opt on /opt type ext4 (rw)
none on /ramdisk type tmpfs (rw,nosuid,nodev)
none on /var/tmp type tmpfs (rw,noexec,nosuid,nodev,size=1000000000)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
none on /ipathfs type ipathfs (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
none on /tmp type tmpfs (rw,noexec,nosuid,nodev,size=1000000000)
10.4.215.201 at o2ib:/lustre01 on /mnt/scratch01 type lustre (rw,_netdev,flock)

Also, it seems unlikely to be system related because the other 49 runs  
are going just fine. I did a ls -la to see if there was some hidden  
file to indicate the lock but could not find any (I have no idea how  
such a lock would work or be detected).

I deleted the .log file, but then I get the error:

Fatal error:
File appending requested, but only 3 of the 4 output files are present

Moving everything to a new directory and then copying it back  
(including the original .log file) allowed me to run the simulation.

Did I do something incorrectly, or is this a bona-fide problem?

Thank you,
Chris.





More information about the gromacs.org_gmx-users mailing list