qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection


From: Tetsuya Mukawa
Subject: Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
Date: Mon, 28 Mar 2016 11:06:19 +0900
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

On 2016/03/28 10:53, Tetsuya Mukawa wrote:
> On 2016/03/26 3:00, Marc-André Lureau wrote:
>> Hi
>>
>> On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu
>> <address@hidden> wrote:
>>>>> The following series starts from the idea that the slave can request a
>>>>> "managed" shutdown instead and later recover (I guess the use case for
>>>>> this is to allow for example to update static dispatching/filter rules
>>>>> etc)
>>> What if the backend crashes, that no such request will be sent? And
>>> I'm wondering why this request is needed, as we are able to detect
>>> the disconnect now (with your patches).
>> I don't think trying to handle backend crashes is really a thing we
>> need to take care of. If the backend is bad enough to crash, it may as
>> well corrupt the guest memory (mst: my understanding of vhost-user is
>> that backend must be trusted, or it could just throw garbage in the
>> queue descriptors with surprising consequences or elsewhere in the
>> guest memory actually, right?).
>>
>>> BTW, you meant to let QEMU as the server and the backend as the client
>>> here, right? Honestly, that's what we've thought of, too, in the first
>>> time.
>>> However, I'm wondering could we still go with the QEMU as the client
>>> and the backend as the server (the default and the only way DPDK
>>> supports), and let QEMU to try to reconnect when the backend crashes
>>> and restarts. In such case, we need enable the "reconnect" option
>>> for vhost-user, and once I have done that, it basically works in my
>>> test:
>>>
>> Conceptually, I think if we allow the backend to disconnect, it makes
>> sense that qemu is actually the socket server. But it doesn't matter
>> much, it's simple to teach qemu to reconnect a timer... So we should
>> probably allow both cases anyway.
>>
>>> - start DPDK vhost-switch example
>>>
>>> - start QEMU, which will connect to DPDK vhost-user
>>>
>>>   link is good now.
>>>
>>> - kill DPDK vhost-switch
>>>
>>>   link is broken at this stage
>>>
>>> - start DPDK vhost-switch again
>>>
>>>   you will find that the link is back again.
>>>
>>>
>>> Will that makes sense to you? If so, we may need do nothing (or just
>>> very few) changes at all to DPDK to get the reconnect work.
>> The main issue with handling crashes (gone at any time) is that the
>> backend my not have time to sync the used idx (at the least). It may
>> already have processed incoming packets, so on reconnect, it may
>> duplicate the receiving/dispatching work. Similarly, on the backend
>> receiving end, some packets may be lost, never received by the VM, and
>> later overwritten by the backend after reconnect (for the same used
>> idx update reason). This may not be a big deal for unreliable
>> protocols, but I am not familiar enough with network usage to know if
>> that's fine in all cases. It may be fine for some packets, such as
>> udp.
>>
>> However, in general, vhost-user should not be specific to network
>> transmission, and it would be nice to have a reliable way for the the
>> backend to reconnect. That's what I try to do in this series. I'll
>> repost it after I have done more testing.
>>
>> thanks
>>
> Hi Yuanhan,
>
> Probably, we have 2 options here.
> One is using DEVICE_NEEDS_RESET, or adding one more new status like
> QUEUE_NEEDS_RESET to virtio specification.
> In this case, we will need to fix virtio-net drivers and virtio-net
> device of QEMU, so it might need to fix a lot of code, but we can handle
> unexpected shutdown of vhost-user backend.
> The other option is Marc's simple solution. In this case, we don't need
> to change virtio-net drivers, but we cannot handle unexpected shutdown.

Let me add a bit.
Actually we can use both options at the same.
For example, only when vhost-user backend closes unexpectedly, use
DEVICE_NEEDS_RESET status.
So probably it's nice to start merging Marc's patches first.

Anyway, if we want to handle unexpected shutdown properly , we may need
to use a kind of DEVICE_NEEDS_RESET status.

Tetsuya

> Thanks,
> Tetsuya
>
>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]