qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] chardev's and fd's in monitors


From: Markus Armbruster
Subject: Re: [Qemu-devel] chardev's and fd's in monitors
Date: Thu, 20 Oct 2016 12:42:01 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

"Dr. David Alan Gilbert" <address@hidden> writes:

> * Daniel P. Berrange (address@hidden) wrote:
>> On Thu, Oct 20, 2016 at 10:55:52AM +0200, Markus Armbruster wrote:
>> > "Dr. David Alan Gilbert" <address@hidden> writes:
>> > 
>> > > * Daniel P. Berrange (address@hidden) wrote:
>> > >> On Wed, Oct 19, 2016 at 02:16:05PM +0200, Markus Armbruster wrote:
>> > >> > "Daniel P. Berrange" <address@hidden> writes:
>> > >> > 
>> > >> > > On Wed, Oct 19, 2016 at 11:05:53AM +0100, Dr. David Alan Gilbert 
>> > >> > > wrote:
>> > >> > >> 
>> > >> > >> We need a way to be able to report an error without plumbing 
>> > >> > >> error_setg
>> > >> > >> up the stack; if you're saying error_report isn't suitable then we
>> > >> > >> should just recommend we switch everything in migration back to
>> > >> > >> fprintf(stderr,
>> > >> > 
>> > >> > In the cases where error_report() isn't suitable, fprintf() is just as
>> > >> > unsuitable for the exact same reasons.
>> > >> > 
>> > >> > > Well both error_report() + fprintf  are broken from POV of anything
>> > >> > > using QMP. error_report() is slightly less broken for HMP,
>> > >> > 
>> > >> > error_report() is not broken at all for HMP code.  The trouble is code
>> > >> > that can't know whether it's running in a context where error_report()
>> > >> > is suitable.
>> > >> > 
>> > >> > >                                                            but 
>> > >> > > doesn't
>> > >> > > help QMP.
>> > >> > 
>> > >> > Correct.
>> > >> > 
>> > >> > > In the short term we should just make error_report be  threadsafe in
>> > >> > > its usage of the monitor.
>> > >> > 
>> > >> > Any problems left once cur_mon is thread-local (which it should be
>> > >> > anyway)?
>> > >> 
>> > >> If we make cur_mon a thread-local, then error_report() is equivalent
>> > >> to fprintf(stderr) for the migration code, since the migration
>> > >> code runs in a different thread thread, and so would always see
>> > >> cur_mon == NULL.
>> > >
>> > > Yes, that would become safe; it does sound the best fix for the current
>> > > worry.
>> > >
>> > > If we had that, then why not wire up error_report to pass errors back to 
>> > > QMP
>> > > as well?
>> > 
>> > Well, that would be similar to how QMP used to work.
>> > 
>> > Back when the design of the QMP monitor was hammered out, we discussed
>> > how to do errors.
>> > 
>> > Anthony argued for passing around error objects.  I pointed out the
>> > enormous amount of work this would require: every call chain from the
>> > monitor to an error needs to be modified, with ripple effects throughout
>> > QEMU.
>> > 
>> > So I proposed a shortcut: have a function that reports the error, except
>> > when in QMP context store it in the monitor instead.  That way, you need
>> > to touch only places reporting errors, not every call chains leading to
>> > one.
>> > 
>> > Sadly, that function couldn't be error_report() back then, because
>> > Anthony insisted on rich error objects, against my opposition.  To
>> > support them, we invented a new function, in commit 8204a91.  Code still
>> > had to be converted to this new function.  But it was the least
>> > laborious solution given the rich error object requirement.
>> > 
>> > Anthony reluctantly accepted "store errors in monitor" as a transitional
>> > interface, mostly because we needed to get QMP off the ground fast, and
>> > passing around error objects would have slowed command conversion to a
>> > crawl.  I hoped the transitional interface would turn out to be quite
>> > practical, and remain.
>> > 
>> > Rich errors turned out to be a dead end, and we abandoned them after a
>> > bit over two years (commit de253f1).
>> > 
>> > The "store error in the monitor" turned out to be a dead end, too.  They
>> > lingered in the tree for a long time, until commit 4629ed1.  My memory
>> > is foggy on why exactly they didn't work out, but reasons include:
>> > 
>> > * What if code attempts to store multiple errors?  We initially made
>> >   that an assertion failure, but quickly had to relax that so that
>> >   subsequent errors are silently ignored (commit 27a749f).  That's
>> >   differently suboptimal.
>> > 
>> > * Failure remains difficult to see in the code.  Before QMP, a monitor
>> >   command handler didn't return status to the monitor core, it simply
>> >   reported it to the human user, possibly buried deep down in some call
>> >   chain.  Only if something up the chain needed to know, we additionally
>> >   propagated failure up the chain in ad hoc ways.  Making error
>> >   propagation the only way to fail commands made failure more obvious in
>> >   the code.
>> > 
>> > * Plumbing errors to the correct monitor is easy only in the
>> >   (synchronous) monitor command handler.  If the handler kicks off some
>> >   background job, you can't store them in the monitor even if you know
>> >   which monitor kicked off the job, because that could interfere with
>> >   another handler's execution!  You'd have to find some other place to
>> >   store, and create some other code to examine that store and do what
>> >   needs to be done.  Whatever that may be: could be sending the error in
>> >   an asynchronous event, could be retaining for a later command to
>> >   report synchronously.  But then propagating errors up the call chain
>> >   starts to look more appealing than it used to.
>> 
>> Our code has increasingly converted to propagate errors up the call
>> chain, but having a mix of different error reporting approaches
>> is increasingly causing pain.
>> 
>> eg a function which propagates errors wants to call into a function
>> whicih uses error_report

When you add an Error * parameter to a function, you get to convert
everything it calls.  Failure to do so is a bug, simple as that.

Likewise, when you add a new call to a function that takes an Error *
parameter.

>>                          - there's no nice way to propagate the error
>> since it has already been reported.  If the function then wants to
>> explicitly ignore the error, then that's impossible too,since it has
>> already been reported.  Add in our code which doesn't use error_report
>> and instead returns errno values, such as the block layer, and it gets
>> even worse because if that calls a function which propagates an error,
>> it has to throw away that useful error and return a useless invented
>> errno value :(

Different bug: when you receive an Error, you have to either handle and
consume it, or pass it on.  Throwing it away and returning an error code
instead counts as neither, and is a bug.

>> IMHO continuing to convert code to propagate errors is the only way
>> out of this swamp, because it provides the greatest flexibility to
>> the callers of said functions to decide how to deal with the error.

I wouldn't call the whole situation a swamp.  error_report() is just
fine in many places.  So is returning -errno.  We convert to Error when
these techniques cease to be fine.  The swampy part is old code where
these techniques have never been fine, or at least not for a while.  To
be drained incrementally.

> The problem is that our way of propagating errors actively discourages 
> people from adding errors and you're left with lots of useless invented 
> errno's.
> error_report makes it easy for people to scatter meaningful error messages
> in at any point.
>
> Make an easy, concise way of reporting an error that fits in with
> a propagation scheme and I'd consider converting stuff.

error_setg(errp, "This is as simple as it gets, I'm afraid")

Snark aside, I acknowledge the pain of converting call chains to
propagate Error objects, having converted "a few" myself.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]