qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier f


From: Halil Pasic
Subject: Re: [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier fails
Date: Fri, 3 Mar 2017 14:08:37 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0


On 03/03/2017 01:50 PM, Cornelia Huck wrote:
> On Fri, 3 Mar 2017 13:43:32 +0100
> Halil Pasic <address@hidden> wrote:
> 
>> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
>>> On Thu,  2 Mar 2017 19:59:42 +0100
>>> Halil Pasic <address@hidden> wrote:
>>>
>>>> The function virtio_notify_irqfd used to ignore the return code of
>>>> event_notifier_set. Let's fail the device should this occur.
>>>
>>> I'm wondering if there are reasons for event_notifier_set() to fail
>>> beyond "we've hit an internal race and should make an effort to fix
>>> that one, or else we have completely messed up in qemu". Marking the
>>> device broken tells the guest that there's something wrong with the
>>> device, but I think we want qemu bug reports when there's something
>>> broken with the irqfd.
>>>
>>
>> That's why the error is logged. I understand virtio_error like something
>> suitable for indicating bugs.
>>
>> What do you suggest? Forcing a dump? I would rather leave it to the
>> user to figure out how important is the state sitting in the machine
>> and the device, and how much effort does (s)he want to put into recovering
>> from the failure. 
> 
> How likely are those logged messages being brought to attention of the
> admin? Does any management software flag machines with such error
> messages? (that's more of a general question)
> 

I admit, I did not investigate this thoroughly, also because the patch
is flawed regarding multi-thread anyway. After a quick investigation
it seems the linux guest won't auto-reset the device so the guest should
end up with a not working device. I think it's pretty likely that the
admin will check the logs if the device was important.

I agree fully that it's a very general question, and I do not feel
competent for answering it.

> I'd like to have some kind of trigger that rings an alarm bell so that
> the admin might consider reporting this, but I don't have a good idea
> on how to do that either...
> 

There are tools for aggregating and processing logs, and triggering
alarm bells too (for example ELK= logstash + Kibana + Elasticsearch).
AFAIK logs are the most common way to deal with such stuff. But I'm far
form being an expert. Of course logs are only as good as the messages
landing in them...

Halil




reply via email to

[Prev in Thread] Current Thread [Next in Thread]