[gmx-users] Failed to lock .log. Already running simulation?
chris.neale at utoronto.ca
chris.neale at utoronto.ca
Fri Dec 2 04:07:04 CET 2011
Dear Users:
I have 50 simulations that are all the same, except with different
random seeds for velocities. All were running fine for 24 hours. I
canceled the running jobs and resubmitted them as part of beta testing
a new cluster. All 50 started. I then canceled one of these jobs soon
after starting it and then started it again pretty quickly (possibly
too quickly). This restart now gave me the error:
Fatal error:
Failed to lock: continue.log. Already running simulation?
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
I found this post about this possibly being related to the Lustre filesystem:
http://lists.gromacs.org/pipermail/gmx-users/2010-November/056173.html
But I am not sure how to figure out if that is being used. Here is the
output from mount:
[nealechr at ip13-mp2 50]$ mount
/dev/mapper/hddvg-root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md0 on /boot type ext4 (rw)
/dev/mapper/hddvg-home on /home type ext4 (rw,usrquota,grpquota)
/dev/md2 on /ltmp type ext4 (rw)
/dev/mapper/hddvg-opt on /opt type ext4 (rw)
none on /ramdisk type tmpfs (rw,nosuid,nodev)
none on /var/tmp type tmpfs (rw,noexec,nosuid,nodev,size=1000000000)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
none on /ipathfs type ipathfs (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
none on /tmp type tmpfs (rw,noexec,nosuid,nodev,size=1000000000)
10.4.215.201 at o2ib:/lustre01 on /mnt/scratch01 type lustre (rw,_netdev,flock)
Also, it seems unlikely to be system related because the other 49 runs
are going just fine. I did a ls -la to see if there was some hidden
file to indicate the lock but could not find any (I have no idea how
such a lock would work or be detected).
I deleted the .log file, but then I get the error:
Fatal error:
File appending requested, but only 3 of the 4 output files are present
Moving everything to a new directory and then copying it back
(including the original .log file) allowed me to run the simulation.
Did I do something incorrectly, or is this a bona-fide problem?
Thank you,
Chris.
More information about the gromacs.org_gmx-users
mailing list