Re: Reliability of RPC services

l4-hurd
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Reliability of RPC services

From:	Bas Wijnen
Subject:	Re: Reliability of RPC services
Date:	Sun, 23 Apr 2006 00:31:14 +0200
User-agent:	Mutt/1.5.11+cvs20060403
On Sat, Apr 22, 2006 at 02:07:00PM -0400, Jonathan S. Shapiro wrote:
> > > > I think the following condition should be sufficient: The kernel
> > > > guarantees that a reply message is sent _at the latest_ when the
> > > > callee process is destroyed.  This should hold true independent of
> > > > what the callee does between being invoked and exiting.  In
> > > > particular, simply dropping the reply capability should not change
> > > > this guarantee (which in effect means that the kernel has to invoke
> > > > the reply capability when it is dropped).
> > > 
> > > Several problems:
> > > 
> > > 1. This requires dynamic storage allocation in the kernel. Dynamic
> > > storage allocation in the kernel implies denial of resource
> > > vulnerabilities and makes any statement of kernel robustness impossible.
> > 
> > Can you elaborate why you think that dynamic storage allocation is
> > required?
> 
> You stated that "simply dropping the capability [does not remove the
> obligation]". In order to satisfy this requirement, the kernel must keep
> track of every reply capability that a service ever receives.

This can be done in the capability structure itself, which is paid for by the
owner, not by the object or the kernel.

> It can shrink the list when it notices that some of these capabilities have
> become invalid, but in abstract this could require an arbitrarily large
> amount of storage.

No, at most one notification slot per capability.

> > If you combine this description with the idea of send-once
> > capabilities, which are moved, not copied, you get a description which
> > works with forwarding.
> 
> Given my expansion of the storage allocation problem, do you still think
> so?

I, at least, do. :-)  The capability holds some data which is used for death
notification.  No kernel storage is required.

> > I want an RPC mechanism that's "better" than UDP.
> 
> I want a pony for Christmass. I predict that I will get my wish
> first. :-)

I don't think so.  "Better than UDP" is very reasonable IMO.  If it's not,
then a trusted wrapper (which can be taken down whenever the server is killed)
must be provided for every (untrusted, but that doesn't limit things a lot)
server.  This seems very undesirable to me.

> > > So this implies that every IPC must check the destination slots to see
> > > if they cause such an overwrite, and must issue death notice calls on
> > > those capabilities. If the IPC payload contains up to N capabilities,
> > > and we assume that the death notice itself does not transfer a
> > > capability, then every IPC has just been multiplied by up to N IPCs.
> > 
> > The common case is that no capability that is overwritten is a valid
> > reply capability.  The common case is thus N checks on memory that is
> > going to become hot anyway.
> 
> Dynamically, 50% of all IPCs will overwrite a reply capability. In many
> cases the overwritten capability will be dead, but the FCRB that it
> names will still need to be paged in so that this can be checked.

Not at all.  It's not the FCRB that's dead, but the capability itself.  On
copy (move, really), the capability was marked as invalid.  This must also
happen on use, if send-once is desired.  That means that by looking only at
the capability itself, the kernel will know that no death notification is
needed:
- A capability can be given a "send-once" flag on creation.
- When this capability is copied, the sender's capability is invalidated.
- When the capability is used, it is invalidated.
- When any capability is destroyed or overwritten, AND it has the send-once
  flag set, AND it is still valid, a death notification will be attempted.

AFAICS this will work fine.  The cost will not be as high as you seem to think
either.  It can still be too high, I cannot judge that.  However, I do have an
other solution to the problem, which will be familiar to Marcus (and it has
been rejected before, but I don't remember why, so that may not apply here.)

On L4.X2, we planned to have a task server doing several things.  One of the
things it had to do was managing death notifications.  For this, a task server
capability had to be retrieved.  The only way to destroy a task was via the
task server, so there was no way that a destruction could be missed.  Whenever
the task server would destroy a task, it would send notifications to every
thread which held a task server capability to it.  Also, it would not reuse
the global thread ID until these notifications had successfully been
delivered, but that is irrelevant here.

I think a similar system could be used to solve the problem in this case.
A new object type, called death-notification, is needed.  This object is owned
by the thread which wants to receive the notification, and is linked to a
thread (or perhaps an object in general) which is monitored for death.

At the start of the RPC protocol, the initiater of the RPC must link a
death-notification to the target thread.  After that, it can start talking to
it.  If the target gets killed, it gets notified and can abort the RPC.

When an object is destroyed, it will be useful to forward the
death-notifications to some other object.  That is: instead of being invoked,
they are transferred to some other object.  This makes forwarding of
reply-responsibility possible.  However, the receiving side must explicitly
agree to receive them, otherwise a malicious server will forward everything to
a system-critical server (which cannot be killed without breaking the whole
system).

I see some problems already, but there may be more:
- Linking the death-notification to a task must be a kernel operation,
  otherwise the malicious (compromised) server can do its attack on that
  operation instead of the RPC.  However, it is possible that the linked
  object must agree to be linked.  I'm not sure if that is useful.  In any
  case, if the to-be-linked object is destroyed during the operation, the
  linking thread must be notified of that (so the operation is aborted).
  Therefore, a simple "request and block for a reply" using capabilities
  cannot be used.
- Object destroy will somehow need to traverse the list of linked
  death-notifications.  Since this must be done in constant storage for the
  object, the links must be stored in the death-notifications themselves.
  This means that destroying or overwriting a death-notification is a
  potentially expensive operation (because the linked list must be updated
  before it can be completed).
- Object destroy becomes an unbounded operation, as there is no bound on the
  amount of linked death-notifications.  However, it will certainly finish,
  since the object can be locked during the operation (meaning no new
  death-notifications can be linked when destroy has started).  Theoretically
  there is a bound of course, which is determined by the total amount of
  resource which can be used to store death-notifications on the system, in
  other words, hard disk space.

Again, I'm not sure if this is an improvement compared to the send-once idea.
I do agree with Marcus that UDP-style RPC operations suck, and we want
something better.  To make clear what I (and I think Marcus) want:
        It should be possible to design an application in such a way that it
        can handle potentially malicious servers, other than by not talking to
        them at all.  When the server is found to be malicious, it is the
        user's responsibility to shoot it down.  When that happens, the
        application should be able to recover.  A condition for that is that
        it gets notified about the situation.

If there is no other (feasable) solution, this notification may need to be
done manually by the user.  However, that would be very undesirable IMO,
because it complicates things for the user, and for the application code
(which must somehow let the user know that it's waiting for the server that
just got killed.  And in some cases, the request may actually have been
forwared before the server was compromised, so there isn't actually anything
wrong).

Thanks,
Bas

-- 
I encourage people to send encrypted e-mail (see http://www.gnupg.org).
If you have problems reading my e-mail, use a better reader.
Please send the central message of e-mails as plain text
   in the message body, not as HTML and definitely not as MS Word.
Please do not use the MS Word format for attachments either.
For more information, see http://129.125.47.90/e-mail.html
signature.asc
Description: Digital signature
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Reliability of RPC services, (continued)
Prev by Date: Re: Reliability of RPC services
Next by Date: Re: Reliability of RPC services
Previous by thread: Re: Reliability of RPC services
Next by thread: Re: Reliability of RPC services
Index(es):
- Date
- Thread