qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 1/2] qemu-error: introduce {error|warn}_repor


From: Cornelia Huck
Subject: Re: [Qemu-devel] [PATCH v4 1/2] qemu-error: introduce {error|warn}_report_once
Date: Wed, 30 May 2018 15:36:00 +0200

On Wed, 30 May 2018 11:30:45 +0800
Peter Xu <address@hidden> wrote:

> On Tue, May 29, 2018 at 11:30:00AM +0200, Cornelia Huck wrote:
> > On Thu, 24 May 2018 12:44:53 +0800
> > Peter Xu <address@hidden> wrote:
> >   
> > > There are many error_report()s that can be used in frequently called
> > > functions, especially on IO paths.  That can be unideal in that
> > > malicious guest can try to trigger the error tons of time which might
> > > use up the log space on the host (e.g., libvirt can capture the stderr
> > > of QEMU and put it persistently onto disk).  In VT-d emulation code, we
> > > have trace_vtd_error() tracer.  AFAIU all those places can be replaced
> > > by something like error_report() but trace points are mostly used to
> > > avoid the DDOS attack that mentioned above.  However using trace points
> > > mean that errors are not dumped if trace not enabled.
> > > 
> > > It's not a big deal in most modern server managements since we have
> > > things like logrotate to maintain the logs and make sure the quota is
> > > expected.  However it'll still be nice that we just provide another way
> > > to restrict message generations.  In most cases, this kind of
> > > error_report()s will only provide valid information on the first message
> > > sent, and all the rest of similar messages will be mostly talking about
> > > the same thing.  This patch introduces *_report_once() helpers to allow
> > > a message to be dumped only once during one QEMU process's life cycle.
> > > It will make sure: (1) it's on by deffault, so we can even get something
> > > without turning the trace on and reproducing, and (2) it won't be
> > > affected by DDOS attack.  
> > 
> > This is good for something (sub-)system wide, where it is enough to
> > alert the user once; but we may want to print something e.g. once per
> > the device where it happens (see v3 of "vfio-ccw: add force unlimited
> > prefetch property" for an example).  
> 
> I'm glad that we start to have more users of it, no matter which
> implementation we'll choose.  At least it means it makes some sense to
> have such a thing.
> 
> For me this patch works nicely enough.  Of course there can be
> per-device errors, but AFAICT mostly we don't have any error at
> all... and my debugging experience is that when multiple error happens
> on different devices simutaneously, they'll very possible that it's
> caused by the same problem (again, errors are rare after all), or, the
> rest of the problems are caused by the first error only (so the first
> error might cause a collapse of the rest).  That's why I wanted to
> always debug with the first error I encounter, because that's mostly
> always the root cause. In that sense, current patch works nicely for
> me (note that we can have device ID embeded in the error message with
> this one).

I think we have slightly different use cases here. In your case,
something (an error) happened and you don't really care about any
subsequent messages. In the vfio-ccw case, we want to log when a guest
tries to do things on a certain device, so we can possibly throw a
magic switch for that device. We still want messages after that so we
can catch further devices for which we may want to throw the magic
switch. (Similar for the other message, we want to give a hint why a
certain device may not work as expected.)

> 
> At the same time, I think this patch should be easier to use of course
> - no extra variables to define, and very self contained. So I would
> slightly prefer current patch.
> 
> However I'm also fine with the approach proposed in the vfio-ccw patch
> too.  Though if so I would possibly drop the 2nd patch too since if
> with the vfio-ccw patch I'll need to introduce one bool for every
> trace_vtd_err() place... then I'd not bother with it any more but
> instead live with that trace_*(). ;) Or I can define one per-IOMMU
> error boolean and pass it in for each of the error_report_once(), but
> it seems a bit awkward too.

I think we can have both the fine-grained control and convenience
macros for those cases where we really just want to print a message
once.

> 
> >   
> > > 
> > > To implement it, I stole the printk_once() macro from Linux.  
> > 
> > Would something akin to printk_ratelimited() also make sense to avoid
> > log flooding?  
> 
> Yes it will.  IMHO we can have that too as follow up if we want, and
> it does not conflict with this print_once().  I'd say currently this
> error_report_once() is good enough for me, especially lightweighted.
> I suspect we'll need more lines to implement a ratelimited version.

Yes, and I agree that wants to be a separate patch should we find a use
case for it.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]