Re: Reliability of RPC services

l4-hurd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliability of RPC services

From:	Bas Wijnen
Subject:	Re: Reliability of RPC services
Date:	Sun, 23 Apr 2006 10:37:12 +0200
User-agent:	Mutt/1.5.11+cvs20060403

Shapiro: I suppose you are still reading this, but in case you're only
scanning, there's a note for you below to which I like your reaction.

On Sun, Apr 23, 2006 at 04:17:46AM +0200, Pierre THIERRY wrote:
> Scribit Jonathan S. Shapiro dies 22/04/2006 hora 13:53:
> > > > couldn't S drop it's capability in a way that won't trigger the
> > > > send that the "invoke on delete" bit asks for?
> > > IIUC, that would enable a malicious S to let C be waiting
> > > indefinitely for the answer
> > It is not possible in principle to prevent this. Do the exercise:
> > design an S that achieves this denial even if "send on overwrite" is
> > present.
> 
> OK: S keeps the capability and just don't bother giving answer to the
> request... I was too focused on the case where we want to kill S when we
> find it is malicious (or buggy, FWIW).

That's good, because that was the case Marcus asked us to look at.  And in
that case, AFAICS, it _is_ possible to prevent the problem, if only the party
responsible for destroying the object (in this case the kernel) would notify
C about the destruction.  Luckily the kernel is part of the TCB anyway, so
trusting it does not add any problems.

> So what should C do to avoid memory leaks (he asks for something through
> a capability, providing the space for the answer, and never gets
> anything, probably several times)

No, just once.  Retying can only be done after a timeout, and they must be
avoided whenever possible.

> or freeze (when waiting for an answer to continue it's work).

If there is no notification, then whoever destroys the task must also notify
everyone who was waiting for it.  However, considering that S has gone crazy
(or was compromized and has become malicious), then even if that person is
allowed to inspect S' memory, she still doesn't know who is waiting, because
that data may have been overwritten!

A possible solution to this is to have a central registration of who's waiting
for who, but that's a shared mutable resource, and Shapiro's been brainwashing
us that that's a very bad idea. ;-)

> In some cases, I suppose human intervention is possible, like when a
> process waits for a device driver to perform (typically disk write) that
> cannot, but that's not suitable everywhere.

Human intervention is (almost) always possible, but it's hardly ever desired.
The system can have the information available to send a death notification.
If it does send it, the problem is solved.

> There are many cases where we could expect the system to recover nicely by
> it's own.

If the applications are designed for it, this should always be possible.
However, death notifications are essential to trigger the recovery process.

> Are timeouts to be avoided most or all of the time?

Yes.  Timeouts cannot have a reasonable value.  If you choose a value for the
timeout, I can choose a system load which is at least high enough to trigger
them in situations where they shouldn't (that is, where the server is
responding, but it's just slow).  This means that you cannot choose a timeout
which will work under any load.  Well, you can, it is "wait forever".  But
that one doesn't actually trigger if the server _does_ fail.

In other words, timeouts other than "0" and "forever" make sure that the
system breaks under load.  This should be avoided. :-)

An interesting thing for Shapiro to note: If you want a reasonable (reliable)
network connection on top of IP (which is close to what you propose to use for
IPC), TCP is the protocol of choice.  TCP needs timeouts, because the computer
may not be notified about broken connections or computers on the other side
being shut down.  This means that indeed, it does fail under heavy load (be it
very heavy, since the timeouts are pretty long).  All this is only needed
because there are no death notifications.  With a trusted kernel and all
parties being on the same computer, under control of that kernel, death
notifications are possible.  They should happen to prevent these problems.

> What are the alternatives?

Death notifications.  That can be as task server capabilities, or via the
send-once capabilities Marcus suggested.  There are probably other ways.  But
notifications there must be.  (In fact, a timeout is also a death
notification, but it's one which gives false positives under heavy load.)

Thanks,
Bas

-- 
I encourage people to send encrypted e-mail (see http://www.gnupg.org).
If you have problems reading my e-mail, use a better reader.
Please send the central message of e-mails as plain text
   in the message body, not as HTML and definitely not as MS Word.
Please do not use the MS Word format for attachments either.
For more information, see http://129.125.47.90/e-mail.html

signature.asc
Description: Digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Reliability of RPC services, (continued)

Prev by Date: Re: Reliability of RPC services
Next by Date: Re: Google's Summer of Code 2006
Previous by thread: Re: Reliability of RPC services
Next by thread: Re: Reliability of RPC services
Index(es):
- Date
- Thread