[gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3): SOLVED

Roland Schulz roland at utk.edu
Sat Nov 27 00:22:49 CET 2010


Hi,

we use Lustre too and it doesn't cause any problem. I found this message on
the Lustre list:
http://lists.lustre.org/pipermail/lustre-discuss/2008-May/007366.html

And according to your mount output, lustre on your machine is not mounted
with the flock or localflock option. This seems to be the reason for the
problem. Thus if you would like to run the simulation directly on lustre you
have to ask the sysadmin to mount it with flock or localflock ( I don't
recommend localflock. It doesn't guarantee the correct locking).

If you would like to have an option to disable the locking than please file
a bug report on bugzilla. The reason we lock the logfile is: We want to make
sure that only one simulation is appending to the same files. Otherwise the
files could get corrupted. This is why the locking is on by default and
currently can't be disabled.

Roland


On Fri, Nov 26, 2010 at 3:17 PM, Baofu Qiao <qiaobf at gmail.com> wrote:

> Hi all,
>
> What Roland said is right! the lustre system causes the problem of "lock".
> Now I copy all the files to a folder of /tmp, then run the continuation. It
> works!
>
> Thanks!
>
> regards,
>
>
> $于 2010-11-26 22:53, Florian Dommert 写道:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> To make things short. The used file system is lustre.
>>
>> /Flo
>>
>> On 11/26/2010 05:49 PM, Baofu Qiao wrote:
>>
>>> Hi Roland,
>>>
>>> The output of "mount" is :
>>> /dev/mapper/grid01-root on / type ext3 (rw)
>>> proc on /proc type proc (rw)
>>> sysfs on /sys type sysfs (rw)
>>> devpts on /dev/pts type devpts (rw,gid=5,mode=620)
>>> /dev/md0 on /boot type ext3 (rw)
>>> tmpfs on /dev/shm type tmpfs (rw)
>>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
>>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>>> 172.30.100.254:/home on /home type nfs
>>>
>>> (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.254)
>>> 172.30.100.210:/opt on /opt type nfs
>>>
>>> (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210)
>>> 172.30.100.210:/var/spool/torque/server_logs on
>>> /var/spool/pbs/server_logs type nfs
>>>
>>> (ro,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210)
>>> none on /ipathfs type ipathfs (rw)
>>> 172.31.100.222 at o2ib,172.30.100.222 at tcp:172.31.100.221 at o2ib
>>> ,172.30.100.221 at tcp:/lprod
>>> on /lustre/ws1 type lustre (rw,noatime,nodiratime)
>>> 172.31.100.222 at o2ib,172.30.100.222 at tcp:172.31.100.221 at o2ib
>>> ,172.30.100.221 at tcp:/lbm
>>> on /lustre/lbm type lustre (rw,noatime,nodiratime)
>>> 172.30.100.219:/export/necbm on /nfs/nec type nfs
>>>
>>> (ro,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219)
>>> 172.30.100.219:/export/necbm-home on /nfs/nec/home type nfs
>>>
>>> (rw,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219)
>>>
>>>
>>> On 11/26/2010 05:41 PM, Roland Schulz wrote:
>>>
>>>> Hi Baofu,
>>>>
>>>> could you provide more information about the file system?
>>>> The command "mount" provides the file system used. If it is a
>>>> network-file-system than the operating system and file system used on
>>>> the
>>>> file server is also of interest.
>>>>
>>>> Roland
>>>>
>>>> On Fri, Nov 26, 2010 at 11:00 AM, Baofu Qiao<qiaobf at gmail.com>  wrote:
>>>>
>>>>
>>>>  Hi Roland,
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> OS: Scientific Linux 5.5. But the system to store data is called as
>>>>> WORKSPACE, different from the regular hardware system. Maybe this is
>>>>> the
>>>>> reason.
>>>>>
>>>>> I'll try what you suggest!
>>>>>
>>>>> regards,
>>>>> Baofu Qiao
>>>>>
>>>>>
>>>>> On 11/26/2010 04:07 PM, Roland Schulz wrote:
>>>>>
>>>>>  Baofu,
>>>>>>
>>>>>> what operating system are you using? On what file system do you try to
>>>>>>
>>>>>>  store
>>>>>
>>>>>  the log file? The error (should) mean that the file system you use
>>>>>>
>>>>>>  doesn't
>>>>>
>>>>>  support locking of files.
>>>>>> Try to store the log file on some other file system. If you want you
>>>>>> can
>>>>>> still store the (large) trajectory files on the same file system.
>>>>>>
>>>>>> Roland
>>>>>>
>>>>>> On Fri, Nov 26, 2010 at 4:55 AM, Baofu Qiao<qiaobf at gmail.com>  wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Hi Carsten,
>>>>>>>
>>>>>>> Thanks for your suggestion! But because my simulation will be run for
>>>>>>> about 200ns, 10ns per day(24 hours is the maximum duration for one
>>>>>>> single job on the Cluster I am using), which will generate about 20
>>>>>>> trajectories!
>>>>>>>
>>>>>>> Can anyone find the reason causing such error?
>>>>>>>
>>>>>>> regards,
>>>>>>> Baofu Qiao
>>>>>>>
>>>>>>>
>>>>>>> On 11/26/2010 09:07 AM, Carsten Kutzner wrote:
>>>>>>>
>>>>>>>
>>>>>>>  Hi,
>>>>>>>>
>>>>>>>> as a workaround you could run with -noappend and later
>>>>>>>> concatenate the output files. Then you should have no
>>>>>>>> problems with locking.
>>>>>>>>
>>>>>>>> Carsten
>>>>>>>>
>>>>>>>>
>>>>>>>> On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Hi all,
>>>>>>>>>
>>>>>>>>> I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is
>>>>>>>>>
>>>>>>>>>  about
>>>>>
>>>>>
>>>>>>>>>  30% slower than 4.5.3. So I really appreciate if anyone can help
>>>>>>> me with
>>>>>>>
>>>>>>>  it!
>>>>>
>>>>>
>>>>>>>  best regards,
>>>>>>>>> Baofu Qiao
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 于 2010-11-25 20:17, Baofu Qiao 写道:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Hi all,
>>>>>>>>>>
>>>>>>>>>> I got the error message when I am extending the simulation using
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  following command:
>>>>>>>
>>>>>>>
>>>>>>>  mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  pre.cpt -append
>>>>>>>
>>>>>>>
>>>>>>>  The previous simuluation is succeeded. I wonder why pre.log is
>>>>>>>>>>
>>>>>>>>>>  locked,
>>>>>
>>>>>
>>>>>>>>>>  and the strange warning of "Function not implemented"?
>>>>>>>
>>>>>>>
>>>>>>>  Any suggestion is appreciated!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *********************************************************************
>>>>>>>>>> Getting Loaded...
>>>>>>>>>> Reading file pre.tpr, VERSION 4.5.3 (single precision)
>>>>>>>>>>
>>>>>>>>>> Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25
>>>>>>>>>> 2010
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>> Program mdrun, VERSION 4.5.3
>>>>>>>>>> Source code file: checkpoint.c, line: 1750
>>>>>>>>>>
>>>>>>>>>> Fatal error:
>>>>>>>>>> Failed to lock: pre.log. Function not implemented.
>>>>>>>>>> For more information and tips for troubleshooting, please check
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  GROMACS
>>>>>>>
>>>>>>>
>>>>>>>  website at http://www.gromacs.org/Documentation/Errors
>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> "It Doesn't Have to Be Tip Top" (Pulp Fiction)
>>>>>>>>>>
>>>>>>>>>> Error on node 0, will try to stop all the nodes
>>>>>>>>>> Halting parallel program mdrun on CPU 0 out of 64
>>>>>>>>>>
>>>>>>>>>> gcq#147: "It Doesn't Have to Be Tip Top" (Pulp Fiction)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>>>  MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>>>>>>>>> with errorcode -1.
>>>>>>>>>>
>>>>>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI
>>>>>>>>>> processes.
>>>>>>>>>> You may or may not see output from other processes, depending on
>>>>>>>>>> exactly when Open MPI kills them.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>>>
>>>>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>>>  mpiexec has exited due to process rank 0 with PID 32758 on
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>>>>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>>>>>>>> Please search the archive at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  http://www.gromacs.org/Support/Mailing_Lists/Search before
>>>>>>> posting!
>>>>>>>
>>>>>>>
>>>>>>>  Please don't post (un)subscribe requests to the list. Use the
>>>>>>>>> www interface or send it to gmx-users-request at gromacs.org.
>>>>>>>>> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>>>> Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>>>> Please don't post (un)subscribe requests to the list. Use the
>>>>> www interface or send it to gmx-users-request at gromacs.org.
>>>>> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>> - -- Florian Dommert
>> Dipl.-Phys.
>>
>> Institute for Computational Physics
>>
>> University Stuttgart
>>
>> Pfaffenwaldring 27
>> 70569 Stuttgart
>>
>> Phone: +49(0)711/685-6-3613
>> Fax:   +49-(0)711/685-6-3658
>>
>> EMail: dommert at icp.uni-stuttgart.de
>> Home: http://www.icp.uni-stuttgart.de/~icp/Florian_Dommert
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.10 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAkzwHkkACgkQLpNNBb9GiPkTDACfYqLGXTdRuyg5cZB82pEdF1L4
>> LGAAoObjD2XR1T2Ypmle3HfamNtrsYlh
>> =oYjI
>> -----END PGP SIGNATURE-----
>>
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the www interface
> or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>
>


-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20101126/1c3dd51e/attachment.html>


More information about the gromacs.org_gmx-users mailing list