Re: Reliability of RPC services

l4-hurd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliability of RPC services

From:	Marcus Brinkmann
Subject:	Re: Reliability of RPC services
Date:	Tue, 25 Apr 2006 20:07:19 +0200
User-agent:	Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (Sanjō) APEL/10.6 Emacs/21.4 (i486-pc-linux-gnu) MULE/5.0 (SAKAKI)

At Tue, 25 Apr 2006 11:45:18 -0600,
"Christopher Nelson" <address@hidden> wrote:
> I like hard real-time systems.  I have thought a lot about the recovery
> aspect of system design.  To me it seems like you have two situations:

Can you give us some references to prior work on this topic that is
most relevant here?  Papers, Thesises, etc.
 
> This might be extended to IPC by doing something similar.  It may not
> ever be necessary to know "when" to stop retrying.  It may be possible
> to indicate to a user that the requested operation is taking longer than
> expected, and to give the user the opportunity to cancel the request.
> Other servers (such as a mail server) may have a settings file which
> dictates how "long" it should keep retrying an operation.  
> 
> In these situations, the metric for timing out may not be some
> compile-time constant, but can be dependent on what the user has said
> should happen.  (In the case of a settings file, it is probably a
> "knowledgeable" user, since all servers should come set with reasonable
> defaults.)
> 
> One other idea that may not be feasible is in regards to timouts being
> flaky in the case of heavy load.  Perhaps it would be better to
> stipulate that the watchdog should keep track of how many requests have
> been processed, and how many are pending.  Over time this indicates an
> "average load".  If this number starts to rise sharply, the watchdog may
> assume that it is now under a heavier load, and can use some metric to
> back off on it's abort policy.  Think about how Ethernet cards use
> binary exponential backoff to make sure only one system is transmitting
> at once, without any explicit session policy.
> 
> Essentially, apps and servers need to be smarter and need to expect
> things to go wrong.  

My concern is that in a system with such complex dynamics, there may
be emergent behaviour that is totally different from what you actually
want.  Your binary exponential backoff is a very good example, as
originally designed it lead to starvation (ethernet capture effect).
Jeff Mogul calls this "emergent misbehaviour", see:

http://www.cs.kuleuven.ac.be/conference/EuroSys2006/papers/p293-mogul.pdf

I really hope that we find a simpler solution, potentially by reducing
the requirements.

Thanks,
Marcus

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Reliability of RPC services, (continued)
- RE: Reliability of RPC services, Christopher Nelson, 2006/04/25
  - Re: Reliability of RPC services, Marcus Brinkmann, 2006/04/25
    - Re: Reliability of RPC services, Pierre THIERRY, 2006/04/25
- RE: Reliability of RPC services, Christopher Nelson, 2006/04/25
  - Re: Reliability of RPC services, Marcus Brinkmann <=
- RE: Reliability of RPC services, Christopher Nelson, 2006/04/26
  - Re: Reliability of RPC services, Pierre THIERRY, 2006/04/26
- RE: Reliability of RPC services, Christopher Nelson, 2006/04/26
  - Re: Reliability of RPC services, Pierre THIERRY, 2006/04/26
  - Re: Reliability of RPC services, Michal Suchanek, 2006/04/26
    - Re: Reliability of RPC services, Jonathan S. Shapiro, 2006/04/26
- RE: Reliability of RPC services, Christopher Nelson, 2006/04/26
  - Re: Reliability of RPC services, Bas Wijnen, 2006/04/26
- RE: Reliability of RPC services, Christopher Nelson, 2006/04/26
  - Re: Reliability of RPC services, Bas Wijnen, 2006/04/26

Prev by Date: RE: Reliability of RPC services
Next by Date: Re: Reliability of RPC services
Previous by thread: RE: Reliability of RPC services
Next by thread: RE: Reliability of RPC services
Index(es):
- Date
- Thread