gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: [BUG] FEATURE PLANS: revlib locking


From: Andrew Suffield
Subject: Re: [Gnu-arch-users] Re: [BUG] FEATURE PLANS: revlib locking
Date: Fri, 4 Jun 2004 15:41:13 +0100
User-agent: Mutt/1.5.6i

On Fri, Jun 04, 2004 at 02:38:50PM +0200, Jan Hudec wrote:
> On Fri, Jun 04, 2004 at 13:07:50 +0100, Andrew Suffield wrote:
> > On Fri, Jun 04, 2004 at 12:47:23PM +0200, Jan Hudec wrote:
> > > Only atomicity I want is atomicity of the underlaying syscall!
> > > 
> > > Now the only way to break atomicity is, that the kernel would
> > > temporarily create the entry even if it's going to return failure and
> > > remove it again. I don't think any kernel would do such useless work.
> > > I also think, that atomicity of link(2) is part of POSIX.
> > 
> > NFS does not attempt to support POSIX filesystem semantics. It's just
> > a convincing facsimilie. Deal with it. POSIX filesystem semantics are
> > not convinient for a network filesystem; I'm not actually aware of any
> > that attempt to implement them.
> 
> No. But it uses the underlaying syscalls, that have the properties and
> the server has no reason to break it.

It's not <syscall is made on client> <syscall completes on
server>. It's <syscall is made on client> <...network...> <syscall is
performed on server> <...network...> <syscall completes on client>.

They don't behave anything like the same way, because the
<...network...> stages aren't reliable.

> > > Now the only way for the NFS server to break this would be to do the
> > > same useless work -- create the link, then check if it should do it and
> > > delete it again if it shouldnt. In addition to being a bug, it's SO much
> > > extra work (when you can just do the syscall and see), that I don't
> > > think any NFS server breaks this.
> > 
> > You think wrong. Here's a specific counterexample along the lines you
> > describe:
> > 
> > The server creates the link, sends the RPC reply saying that the link
> > was created, and the reply gives an EHOSTUNREACH message. Just to make
> 
> But I do not care for the return value of the link request! I just drop
> it on the floor!!!

Then your locking protocol is unsafe; two clients can think they've
got the lock at the same time. Really, you're doing the "naive
student" thing; this discussion is older than *me*. You're making all
the classic mistakes.

> > things even more fun, you now have a link and two clients which think
> > it's not their lock. You can spend weeks chasing the corner cases, but
> 
> No. The clients check link-count on their respective temporary files.
> Since it will be a link to one of them, one of the clients will,
> eventualy, get reply saying 2. That client knows it has the lock and
> goes ahead. The other equaly well knows the resource is not available.

Now you've assumed that the temporary files were created
atomically. You might as well have just used that as your lock state;
it would be just as safe (ie, not at all, since file creation isn't
atomic on NFS).

The safe protocol for hardlink-based locking is *this*:

Create a randomly named temporary file.

Hardlink the lock file to the temporary file.

If the link succeeds and the link count on the lock file is 2, you
have the lock. Otherwise you missed: delete the lock file and your
temporary file, wait a random interval, and try again.

Which performs like a dog. And that's the only one which works. No
amount of trying to vary it will discover anything new about NFS.

> Btw: Connections don't win you anything.

Go away and do not bother me again until you have tried to implement a
network file system.

> > > > > And no resolution is perfect.
> > > > > It simply can't be, because you never know for sure whether the client
> > > > > crashed, or just it's connection died.
> > > > 
> > > > ...actually, you do. That's what NSM gives you, as supported by
> > > > rpc.statd, which is a required component for fcntl() locks over
> > > > NFS. It's based on the assumption that clients will get rebooted
> > > > reasonably quickly.
> > > 
> > > ... which may work in few more cases, but does not work generaly.
> > 
> > Actually it does work in the general case.
> 
> But not universaly.

Right, only when both server and client are running NLM and NSM,
otherwise you'll get an error when you try to acquire a lock. Which is
quite adequete, if your boxes run the relevant services, otherwise
it's useless.

-- 
  .''`.  ** Debian GNU/Linux ** | Andrew Suffield
 : :' :  http://www.debian.org/ |
 `. `'                          |
   `-             -><-          |

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]