qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 1/2] qemu-error: introduce {error|warn}_repor


From: Halil Pasic
Subject: Re: [Qemu-devel] [PATCH v4 1/2] qemu-error: introduce {error|warn}_report_once
Date: Wed, 30 May 2018 17:15:19 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0



On 05/30/2018 06:47 AM, Michael S. Tsirkin wrote:
On Thu, May 24, 2018 at 12:44:53PM +0800, Peter Xu wrote:
There are many error_report()s that can be used in frequently called
functions, especially on IO paths.  That can be unideal in that
malicious guest can try to trigger the error tons of time which might
use up the log space on the host (e.g., libvirt can capture the stderr
of QEMU and put it persistently onto disk).

I think the problem is real enough but I think the API
isn't great as it stresses the mechanism. Which fundamentally does
not matter - we can print once or 10 times, or whatever.

What happens here is a guest bug as opposed to hypervisor
bug. So I think a better name would be guest_error.

I don't agree with your argument against the name report_once
Michael. In my reading the commit message describes one of use
cases for which the infrastructure introduced by this patch is
a supposed to be a good fit. But report_once is not restricted
to this example.

In my previous life in the userspace I had to debug problems
where the original error message got log-rotated away because of an
onslaught of error messages that were a consequence of the original
one, and not very helpful.

IMHO raising the issue of guest_error is a very sane thing to do,
but it is a different problem. I think guest_error is about how and
to whom the error is to be reported. IMHO report the error to the
ones that are affected by it and to the ones that can do something
about it (e.g. fix it) is a good rule of thumb. The latter may be
different for hypervisor and for guest bugs.

In my understanding this is really about spamming the log problem.
Of course one can try to solve/mitigate the problem at different
levels. It could be declared
1) a problem to be solved in the logging library more or less
transparently
2) a problem to be solved by the environment and it's admin (e.g.
log aggregation, filtering, and rotation)
3) a problem that the client code of the logging library has to
explicitly deal with

The once and rate_limited are 3).

To sum it up guest error or not and once or not are orthogonal
problems in my view.

Regards,
Halil


Internally we can still have something similar to this
mechanism.

Another idea is to reset these guest error counters on guest reset.
Device reset too? I'm not 100% sure as guest can trigger device resets.



[..]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]