l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliability of RPC services


From: Marcus Brinkmann
Subject: Re: Reliability of RPC services
Date: Tue, 25 Apr 2006 20:07:19 +0200
User-agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (Sanjō) APEL/10.6 Emacs/21.4 (i486-pc-linux-gnu) MULE/5.0 (SAKAKI)

At Tue, 25 Apr 2006 11:45:18 -0600,
"Christopher Nelson" <address@hidden> wrote:
> I like hard real-time systems.  I have thought a lot about the recovery
> aspect of system design.  To me it seems like you have two situations:

Can you give us some references to prior work on this topic that is
most relevant here?  Papers, Thesises, etc.
 
> This might be extended to IPC by doing something similar.  It may not
> ever be necessary to know "when" to stop retrying.  It may be possible
> to indicate to a user that the requested operation is taking longer than
> expected, and to give the user the opportunity to cancel the request.
> Other servers (such as a mail server) may have a settings file which
> dictates how "long" it should keep retrying an operation.  
> 
> In these situations, the metric for timing out may not be some
> compile-time constant, but can be dependent on what the user has said
> should happen.  (In the case of a settings file, it is probably a
> "knowledgeable" user, since all servers should come set with reasonable
> defaults.)
> 
> One other idea that may not be feasible is in regards to timouts being
> flaky in the case of heavy load.  Perhaps it would be better to
> stipulate that the watchdog should keep track of how many requests have
> been processed, and how many are pending.  Over time this indicates an
> "average load".  If this number starts to rise sharply, the watchdog may
> assume that it is now under a heavier load, and can use some metric to
> back off on it's abort policy.  Think about how Ethernet cards use
> binary exponential backoff to make sure only one system is transmitting
> at once, without any explicit session policy.
> 
> Essentially, apps and servers need to be smarter and need to expect
> things to go wrong.  

My concern is that in a system with such complex dynamics, there may
be emergent behaviour that is totally different from what you actually
want.  Your binary exponential backoff is a very good example, as
originally designed it lead to starvation (ethernet capture effect).
Jeff Mogul calls this "emergent misbehaviour", see:

http://www.cs.kuleuven.ac.be/conference/EuroSys2006/papers/p293-mogul.pdf

I really hope that we find a simpler solution, potentially by reducing
the requirements.

Thanks,
Marcus





reply via email to

[Prev in Thread] Current Thread [Next in Thread]