l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comparing "copy" and "map/unmap"


From: Matthieu Lemerre
Subject: Re: Comparing "copy" and "map/unmap"
Date: Sun, 09 Oct 2005 15:45:09 +0200
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

"Jonathan S. Shapiro" <address@hidden> writes:

> I apologize for the length of this note. It is not a simple issue.

Please don't apologize, on the contrary.  I find this issues really
fascinating, and the quality of your mails are really helpful.

>
> We seem to have stumbled into a discussion of capability semantics:
> what does it mean when a capability is transferred from one process
> to another, and what impact does this have on overall system
> design. I want to *try* to make some comments about the trade-offs
> between the "copy" style of capability transfer and the "map" style
> of capability transfer.
>
> I will attempt to be balanced. However: I must warn everyone that I have
> a very strong view on this issue. In defense of this view, I note that
> we have 30 years of successful experience with the "copy" approach, and
> zero years of experience with the "map/unmap" approach as applied to
> capabilities. This does not mean that I am right, and it does not mean
> that my views on deficiencies of map/unmap are correct. There may be
> clever ways to avoid the issues that concern me.
>
> Also, I must emphasize that I am describing one snapshot of L4sec that
> existed at one time in their design discussions. The L4sec team is still
> learning about capabilities, and their design is still being revised.
> The Coyotos design is a minimal delta from EROS. While Coyotos is still
> evolving, we are debating issues of mechanism and implementation rather
> than issues of semantics.
>
>
>
>
> SIMILARITIES:
>
> In both designs, capabilities are kernel protected. An application
> references its capabilities by specifying an address relative to some
> kind of capability address space.
>
> In both L4sec (the L4 successor) and Coyotos (the EROS successor),
> capabilities are stored in capability address spaces. The mechanisms for
> managing traditional data address spaces in each system have been
> extended in the obvious ways so that they can be used on capability
> spaces as well.
>
> In both systems, the way you invoke a service is by invoking a
> capability to that service.
>
> COYOTOS CAP TRANSFER SEMANTICS
>
> In EROS/Coyotos, the fundamental operation on capabilities is COPY. When
> process A sends a capability to process B, the capability received in
> process B is co-equal in status with the original in process A. This is
> directly parallel to what happens for data. When process A sends the
> value 5 to process B, the copy of 5 that arrives in B has the same
> status as the 5 that was transmitted.
>
> L4SEC CAP TRANSFER SEMANTICS
>
> In L4sec, the fundamental operations on capabilities are MAP/UNMAP. When
> process A sends a capability to process B, the copy received by B is
> *subordinate*. This is true because A can later unmap the capability. It
> means that any use of the capability by B is dependent on the continuing
> good behavior of A. More generally, the use of any capability relies on
> the continuing good behavior of all processes that were involved in the
> transfer of the capability from its origin to its place of use.
>
> Good behavior means:
>
>   The parties who transferred the capabilities do not maliciously
>   revoke it.
>
>   The parties who transferred the capabilities do not exit until
>   all receivers are done with those capabilities.
>
> TYPES OF REVOCATION
>
> There are two types of revocation that we need to distinguish. People
> often fail to be clear in these discussions about what they mean:
>
>   REVOCATION: destroying *all* capabilities to an object (everywhere).
>
>   SELECTIVE REVOCATION: destroying *some* capabilities to an object,
>     according to some test criteria.
>
> General revocation is very easy in both systems. Selective revocation is
> a pain in the ass in every system, no matter how it is done.
>
> SOME COMMENTS
>
> The *good* part about the COPY model is that it makes object creation
> and capability copy cheap. The *bad* part is that selective revocation
> requires some application-level planning. In Coyotos, the way you do
> selective revocation is that you insert a transparent forwarding object
> (this is a kernel-implemented object) in front of the real capability,
> and then pass the forwarding capability to the receiver instead of the
> real capability. Later, you can destroy the forwarding object, which
> revokes their capability.

OK.  So you have to create a new forwarding object for each new
client, but I assume this is a relatively cheap operation.

>
> The *good* part about the MAP/UNMAP model is that it makes selective
>revocation easy -- just unmap. The *bad* part about the MAP/UNMAP
>approach is that it makes capability copy and object creation very
>expensive. Implementing copy requires a "capability exchange"
>protocol with a capability server. This means that a copy involves a
>minimum of three IPC operations:
>
>   1. A copies to B
>   2. B invokes CapServer to request an exchange
>   3. CapServer returns an equivalent cap that cannot be revoked by A.
>
> However, this description of the workaround is incomplete. It ignores
> other complications that CapServer introduces:
>
>   1. All newly created objects must be registered with the CapServer.
>      Object creation is therefore expensive.
>
>   2. The CapServer must have sufficient storage to store one copy of
>      every capability. This is a surprisingly big problem.
>
>   3. Authority to create an object must imply authority to allocate
>      storage in the CapServer. Since the CapServer is a globally shared
>      resource, this introduces the possibility of denial of resource by
>      exhausting CapServer storage. Fixing this requires an object
>      quota of some sort. It is very difficult to select good limits
>      for such a quota.
>

What we originally planned was to have the cap server be also a
reference counter, and thus the place where ressource accounting would
have been done.  Thus the cap server could easily allocate storage for
its client and assign this to the amount of ressources allocated by a
client. (I guess I should use the word "principal" here instead of
client).  Maybe I have a too much simplistic view of the problem,
though.

The other possibility is that storage necessary for storing the
capabilities can be given in what we call "memory containers" by the
client.  But this imposes some restrictions on the cap server
implementation (you can't create linked lists between object in
different containers because they could be revoked and you would loose
a part of the list).  

So, I think that in the general case, the steps 2. and 3. are
problems, but they seem to be solved when the cap server is also the
ressource accounter, which is what we planned to 2.

Still, this does not solve the 1. problem, which is 1 RPC overhead.

You scheme is approximatively what I have written in
http://lists.gnu.org/archive/html/l4-hurd/2005-09/msg00056.html (based
on Marcus' work).

>
> Four thought experiments may be useful to get some intuition for the
>impact of the CapSever and the UNMAP operation on the overall design:
>
> 1. Imagine trying to build robust programs in a Java system where
>    you needed to say in advance how many total objects you planned
>    to allocate.

This is a consequence of restricted memory in the cap server, which (I
think) is no longer true in our very particuliar case with our
work-around.

>
> 2. Imagine trying to build a robust program in Java where you always
>    had to be prepared for the possibility that some other thread (one
>    that you never heard of) might destroy your objects, invalidating a
>    pointer in the middle of one of your operations.

Right.  This is the problem with a server that can unmap flex pages at
any time.  Here again, we have a trusted third party, the physmem
server, which enables to share "containers" between clients.

Same goes for threads.

But what are the others mappable objects in the new L4s' design?  They
are communication end points, and for Karlsruhe, there is also
capability IDs (which is close, I think, to number capabilities).

These two mappable objects are used to implement user capabilities
(for instance, for a filesystem).

Unmapping these objects is not that harmful: upon RPC attempt, the
client only has to be prepared to an error.

Also, I don't see how capability with copy, which can be selectively
(or collectively) revoked, solve this problem.  Does the client
receive a notification upon revocation?
>
> 3. Now imagine that the revoking thread doesn't even need to be hostile.
>    Imagine that exit of a thread revokes any capability it holds
>    (the analogy is that L4sec task exit reclaims its address space).
>    Now try to answer the question: "Given that the exiting thread
>    and the using thread are supposed to be isolated from each other,
>    how do I design a protocol that allows the exiting thread to know
>    when it is safe to exit?"

I assume here that you don't want to take "emulating capability copy
by a trusted third party" in account, because this would be a bit
cheating.

So for instance, all the intermediary threads in the chain of
transmission of a capability have to stay alive.  I agree, this is a
big problem.  But the problem stays with COPY if the thread providing
the ressource wants to exit.

>
> 4. Once you have designed that protocol, now imagine that the thread
>*using* the capability is hostile. One way to be hostile would be
>never to admit to the allocating thread that you are done with a
>capability. This prevents a well-behaved allocating thread from
>exiting, which is a direct denial of resource attack.
>
Same that for 3., emulating capability copy would be a solution.

Well, it is true that if we use map/unmap only to emulate copy all the
time, then map/unmap isn't the right set of primitives.

What is the allocating thread?  Is it the thread who respond to calls
on the capability?  If so, the problem is the same with COPY: if this
thread exits, all clients won't have access to the capability anymore.

>
> Step back and ask *why* everyone is talking about implementing a
>capability server in L4sec-based designs. Fundamentally, the reason
>for the capability server is that it can be trusted to satisfy "good
>behavior" as I have defined it above. In fact, the purpose of the
>capability server is to re-establish capability copy semantics on a
>system that does not provide them.
>

Well, the cap server we planned to write had 3 distinct roles:

1/To be a trusted third party for re-establishing capability copy
 semantics

2/To be the central point for accounting ressources (since it has
 knowledge of every capabilities mapped everywhere)

3/To be the reference counter for servers who need this (and are
 allowed to know how much clients they are serving).

If we have direct access to capability copy semantics, we would still
have to get something for the two other functionnalities.

* I know that deallocating ressources when nobody is using it can be a
denial of resource attacks.  But the problem is minimized by these two
facts:

-Ressources are allocated on behalf of the client. So the client can
 only exhaust its own ressources.

-Even if we deallocate a ressource when the reference counter reach 0,
 we can still force revocation.

So the reference counter is much more a convenience, but I think we
still need it.  What we need in fact is a notification to the server
when nobody is using the ressource (if the server is allowed to
receive this type of notification).  I see two ways to achieve this:

-Upon each copy, a reference counter server automatically receives a
 notification by the kernel that a capability has been copied.  There
 is, in this case, no "win" against the cap server solution.

-The notification is done by the kernel.  I don't know if this is a
 big win.

How do you do such a thing in EROS?

* For accounting ressources, I think we could have a "policy server"
  or "ressource server" which could ask every spare ressource server
  (that is, the ones that provide IO, memory and CPU time/threads) how
  many ressource has each client consumed.  Or maybe this information
  could be placed in a shared memory region (these 4 servers would
  trust each other).  So I think there are possible solutions.  I
  would also be pleased to know how you solve this issue in EROS.
>
> Finally, note that there is a race condition in any capability
>exchange protocol. As far as I know the L4sec team has not defined an
>efficient, race-free protocol to accomplish this.

Are you speaking about capability exchange protocol between a server
and a client, or between 2 clients?

I would be interested to know more about this race.  What could a
malicious client do?

The protocol we would have used, I think, would have been the
following: clients A maps the capability to B, B makes a call on this
capability to the cap server C, which recognize the capability and
maps it back to B.

The only race I see is if A unmaps the capability before B calls C,
but B would be able to detect that when the IPC fails.

>
> Unfair summary: capability exchange is like needle exchange. It's
>better than using unclean capabilities, but what you really want is
>to have clean capabilities in the first place. :-)
>
> A MORE BALANCED VIEW
>
> The two operations (COPY and UNMAP) are very different. Either can be
> used to functionally emulate the other up to a point (more on this
> below!), but whichever operation is emulated is going to be more
> expensive.
>
> It is probably unfair to blame MAP/UNMAP for the problems I have
> identified above. The problem is actually more fundamental:
>
>   Regardless of mechanism, selective revocation implies the need
>   to program defensively. This is true whether the operation
>   that does the revocation is UNMAP or DESTROY(forwarder).
>
> Note also that the real cost in this is not in the "revoke" step.
> Provided that the same number of capabilities are getting revoked, both
> systems are going to be equally fast at the "revoke" step. The cost lies
> in the "copy in such a way that I can later revoke" step. because of
> this, I am now going to use the terms "COPY" and "REVOCABLE COPY".
>
> THE KEY QUESTIONS
>
> There are two questions to ask:
>
>   1. Which operation (COPY or REVOCABLE COPY) is used more often?
>   2. If one of these operations is used as a foundational operation
>      for emulating the other, is there a choice that makes robust
>      programming easier?
>
> ** Frequency of Use
>
> A lot of experience in KeyKOS/EROS/Coyotos suggests that three types of
> capability transfers account for the overwhelming majority of transfers
> in real systems:
>
>   1. Transfers between an allocating server and a client, where the
>      client is not going to be the exclusive user of the object.
>   2. Transfers between mutually trusting components where neither
>      is going to revoke the other.
>   3. Transfers between a client and a server, where the server will
>      hold the capability temporarily, but the client trusts the server
>      to handle that capability correctly.
>
> All of these are cases where the operation you want is COPY, not
> REVOCABLE COPY.
>
> This may or may not be true for Hurd: my intuition is that as object
> servers become larger they tend to become more monolithic. As they
> become more monolithic, the trust relationships in the system
> architecture become progressively more unequal, and the use of revocable
> transfers becomes more common. I still suspect that REVOCABLE COPY is
> the less frequent operation, but this is something that needs to be
> measured.

In the Hurd, I think that most often, when a server serves an object
 to a client, we falls into case 1. (for instance, for a filesystem).

2. is far less used I think.

For 3., in the Hurd most often the client does not fully trust its
server.  So it wouldn't want to give to it ressources that he couldn't
revoke later.  So I think the operation we would want here is more
REVOCABLE COPY.  But, for memory, we don't want the client to revoke
memory at anymoment, because recovering from a pagefault is not
something easy IMHO (or maybe we could design the servers to be
prepared to it.  I don't know how hard that would be.)

What is sure is that we don't want the server to be able to revoke the
capability provided by the client.  So it would be better to provide
at least an "unrevocable" capability.

Marcus and Neal could tell more about this: they're far more familiar
with the higher-level servers of the Hurd than I am.

>
> In any case, if COPY accounts for the majority of capability
>transfers, and a smaller number are REVOCABLE COPY, then it seems
>clear that the right primitive operation is COPY.
>
> ** Program Robustness
>
> We believe very strongly that program robustness is easy for
> capabilities that are transferred by COPY, and very difficult for
> capabilities that are transferred by REVOCABLE COPY. There are, in
> practice, four cases of selective revocation:
>
>   1. I'm killing the program anyway, and it isn't going to matter
>      that it's capabilities become invalid.
>
>   2. I'm revoking a capability according to a previous agreement.
>      The victim ought to be prepared for this (though actually
>      *implementing* this expectation is difficult).
>
>   3. I'm revoking a capability because a process is misbehaving.
>      In this case we don't really worry about negative impact
>      on the process.
>
>   4. I'm revoking a capability because I screwed up. This issue
>      doesn't really have anything to do with the choice of capability
>      transfer mechanism.
>
>   5. I'm revoking a capability because I need to do something
>      reasonable (e.g.: exit) and the capability transfer architecture
>      didn't give me a choice.
>
> The last one is particularly unfortunate, and the MAP/UNMAP design
> introduces a lot of this.
>

The last one applies to the intermediary processes, but how does it
applies to the creating process?

For case 2., I think the majority of cases are when a client gives a
capability to a server, because the server has to use it to serve the
client.  The client would then revoke the capability it gave to the
server when it does not need the server's services anymore (and the
server shouldn't have to use the capability provided by the client
anymore)

>
> TRANSPORT ISOLATION
>
> For any capability, there is a chain of processes between the creating
> process and the using process. This chain can be viewed as a "transport"
> that copies the capability.
>
> As I wrote last night, the impact of capability authentication is not
> (yet) fully understood by the L4sec group. In the absence of capability
> authentication, your ability to rely on *any* capability depends
> entirely on the channel by which you receive it. If this is the case,
> then you necessarily depend on every process in the transport between
> you and the origin. When you take that perspective, the issue of process
> exit is still really irritating, but the fact that you *continue* to
> depend on the transport after receiving a capability doesn't really seem
> like much of a problem.
>
> However, once it is possible to authenticate capabilities, you would
> very much like NOT to rely on the transport after receipt. After
> authentication, the *source* of a capability ceases to be relevant,
> because you know its implementation. If the implementation is
> trustworthy, the sender of the capability cannot alter its behavior, and
> there is no further dependency on the sender or the transport. At THAT
> point you really want a transport that gets out of the way. You can't
> get that cheaply if the foundational operation is REVOCABLE COPY.

What if there is a mean to turn a revocable capability into a normal
capability? This is a task that the cap server could accomplish.  You
could then have the transport which could do REVOCABLE COPIES from
process to process, and the final receiver would acquire a normal copy
from the cap server.

But I agree that in this case, it would be better to use COPY in the
first place.

>
>
> PROBLEMS WITH HIERARCHY
>
> Ignoring problems of performance and storage allocation, all of the
> problems with REVOCABLE COPY that I have identified can be resolved
> using a CapServer. In effect, we are saying that the only *correct*
> systems that can be built on top of a REVOCABLE COPY primitive are those
> that have a hierarchical system structure. Further, we are saying that
> several elements of this trusted hierarchy are complex, and are
> therefore prone to both functional and security errors. The CapServer is
> certainly an example of this.
>
> In fact, there is a hierarchy problem in L4.x2 today in the memory
> manager. Consider two process A, B with respective pagers A', B'. Now:
>
>       A' maps to A
>       A maps to B
>       A' revokes
>       B' knows nothing and cannot reconstruct the mapping.
>
> This problem is now well-known by the L4 designers, and it is a direct
> consequence of using REVOCABLE COPY as the primitive operation. In every
> real system that has been constructed on top of L4.x2, the solution has
> been to require that either
>
>       A' and B' are identical, or
>       A' and B' have a commonly trusted parent who knows how to
>         recover, or
>       The design is broken, so unmaps are not performed.
>
> The current L4sec design will require that every capability interaction
> must use the same kinds of solutions.
>

I have to object here:

-First, why would B' reconstruct the mapping?  The best would be to
 cancel the current operation.  Maybe B' could then restart B at a
 certain state, where it is waiting for requests (B being a server).
 I don't know the difficulty of writing this, though.

-Second, are there pagers for mappable objects other than fpages?  Is
 there a need for them?  And, as I wrote above, it's much easier to
 recover from an unmap for these objects than for memory pages.
 
OTOH, I agree that the fear of an unmap at anytime is a problem.  I
don't know if Neal find his solution satisfactory, and I don't know it
very well.

>
> I confess that this continues to surprise me. It appears to me that
>there is a known, fundamental, architectural flaw here whose only
>solutions involve the imposition of policy on the OS designer. Either
>L4 is intended to be a general platform for OS construction, and this
>policy imposition should be eliminated, or the claim that L4 is a
>general purpose nucleus should be abandoned.
>
> For a separate reason, I find this *particular* architectural policy
> very worrisome. It has the effect of forcing very sensitive and
> complicated software to be gathered together in one place, and to be
> implemented in such a way that every process must depend on the
> correctness of this centralized code. This is such a fundamental
> violation of basic secure programming practice that I cannot comprehend
> why it should be an acceptable constraint for a microkernel design to
> impose on a system.
>
> Hierarchies in microkernel systems cannot entirely be avoided. You want
> them to be as shallow as you can make them, and you want the programs
> involved to be as simple as possible.
>

It seems that the new L4s will be more hierarchical than they were.
It is true that these "root servers" are becoming complex, and more
prone to security problems.

>
> THE EMULATION FALLACY
>
> It is being said that either system can be emulated on top of the other.
> This is true only in the very narrow sense that it is possible to build
> a library API with a functional interface that can be implemented by
> both systems. Unfortunately, it is NOT true in two important regards:
>
>   There is a fundamental difference in performance, which
>   impacts the set of feasible system architectures.
>
>   COPY cannot be emulated on top of REVOCABLE COPY without
>   a centralized CapServer. The CapServer must allocate storage
>   for every capability that is created, and can therefore be
>   subjected to denial of resource attacks.
>
> The second issue is critical. One of the most basic design principles in
> EROS/Coyotos is:
>
>   No free rides! The party who allocates must pay!
>
> This is a design principle because in EVERY case where it has failed we
> have been able to identify successful attacks on the overall system
> design. I have not seen (and I have been unable to design) a CapServer
> that satisfies this design principle.
>

What is so special with the cap server?  The problem I have seen is
that if the client provides revocable memory to the cap server for its
storage, then you can't have pointers between the different memory
regions.  But it seems to be solved if the cap server can allocate
memory for itself and attributes it back to the client (or at least,
it can use trusted memory accounted to the client, which is supposed
to have a limited amount of memory).

Is there any problem with this approach?




I'll try to make a little summary of this discussion:

* There are two set of primitives operations, MAP/UNMAP and COPY.
  Each can be emulated by the other (with problems), so it is better
  to name them COPY and REVOCABLE COPY.

* Emulating COPY requires a cap server, which is problematic for
  performances issues and possible denial of ressources attack.  I
  tried to propose solutions to avoid this denial of ressources
  attack.  There is also a race problem when copying a capability
  between two clients.

* The fact that memory can be unmapped at anytime is a real problem,
  but I think this isn't the case for other capabilities.

* When the only operation is MAP, all processes in the chain of
  transport must still be alive when the client is using the
  capability.  The problem become more evident when capabilities can
  be authentificated.

* The cap server was planned to have three roles: a trusted third
  party used to reestapblish capability copy semantics, a point for
  accounting ressources, and a reference counter.  If we have copy
  semantics, how can we have the two other functionnalities.

* Depending on the type of capability transfert, it is better to have
  COPY or REVOCABLE COPY.  We don't know what is the majority of
  transferts in the Hurd yet.

* Programs written with copy seem more robust, but I don't know if
  we encounter often the problem you gave in the Hurd.

* MAP/UNMAP leads to a hierarchical organisation, and this is a
  security issue.

I hope I don't have forgotten anything.


I'd like to add something to our discussion.  Why wouldn't we want to
have both operations?  Revocable copy seems to be useful in the
following cases:

1. Transfers between an allocating server and a client, where the client
   is going to be the exclusive user of the object.

We could extend this definition so that my notification server exemple
can be in:

1. Transfers between the "access provider to an object" and a client,
   where the client has to be the unique user of the object.

In my exemple, the "access provider to an object" is a client of the
notification server, which gives a capability to a server to send
notifications to it through the notification server.

2. Transfers between a client and an untrustable server. This is often
   the case in the Hurd.

We can note than although a server copies a capability to its client
only once, a client may want to map its capability to a server, then
unmap it, then map it to another server, unmap etc...

So the use cases of both transferts seem to be roughly equivalent
(maybe more for the copy, but what is sure is that revocable copy is
an unnegligible part).

So we want to have both operations available: the servers would COPY
their capabilities to clients, and clients would REVOCABLE COPY their
capabilities to servers.

* What is important is that upon revocation of a REVOCABLE COPY, all
  capabilities copied from the capability revocably copied are also
  revoked.
  
  This is the case with EROS' forwarding object.
  
  If we extended L4 to support copy operations, we would certainly
  extend the semantics as follow: if A copies a capabability to an
  object in C to B, provided that C has mapped the capability to A, then
  the result would be similar than if C had mapped the capability to B.
  
  But in this scheme, you don't know how to revoke all the capabilities.
  Thus a flag would be needed to tell if the capability could be copied
  or not: then it would be OK.
  
  Thus, both operations have to be cheap, and we shouldn't priviledge
  one over the other.

* I also have a question on implementation of EROS' forwarding
  objects: if we have to establish a hierarchy of REVOCABLE COPIES, we
  thus have to create a new forwarding object each time.  How does the
  lookup of the real object occurs?  Is it constant in time, or
  dependent of the depth of the mapping tree?


> This is probably far too much for one note, but I wanted to try to lay
> out as coherent a picture as I could. Let us see what discussion
> emerges.

Thanks for all your commentary.  It helps me a lot to clarify things
in my head, and is really interresting.

Thanks,
Matthieu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]