qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENE


From: Stefan Priebe - Profihost AG
Subject: Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
Date: Thu, 30 May 2013 08:58:44 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6

Am 30.05.2013 00:29, schrieb mdroth:
> On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote:
>> On Mon, 27 May 2013 12:59:25 -0500
>> mdroth <address@hidden> wrote:
>>
>>> On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote:
>>>> Luiz Capitulino <address@hidden> writes:
>>>>
>>>>> On Sun, 26 May 2013 10:33:39 -0500
>>>>> Michael Roth <address@hidden> wrote:
>>>>>
>>>>>> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
>>>>>> QEMUTimer. Due to timers being processing at the tail end of each main
>>>>>> loop iteration, this generally meant that such events would be emitted
>>>>>> within the same main loop iteration, prior any client data being read
>>>>>> by tcp/unix socket server backends.
>>>>>>
>>>>>> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
>>>>>> a "bottom-half" that is registered via g_idle_add(). This makes it
>>>>>> likely that the event won't be sent until a subsequent iteration, and
>>>>>> also adds the possibility that such events will race with the
>>>>>> processing of client data.
>>>>>>
>>>>>> In cases where we rely on the CHR_EVENT_OPENED event being delivered
>>>>>> prior to any client data being read, this may cause unexpected behavior.
>>>>>>
>>>>>> In the case of a QMP monitor session using a socket backend, the delayed
>>>>>> processing of the CHR_EVENT_OPENED event can lead to a situation where
>>>>>> a previous session, where 'qmp_capabilities' has already processed, can
>>>>>> cause the rejection of 'qmp_capabilities' for a subsequent session,
>>>>>> since we reset capabilities in response to CHR_EVENT_OPENED, which may
>>>>>> not yet have been delivered. This can be reproduced with the following
>>>>>> command, generally within 50 or so iterations:
>>>>>>
>>>>>>   address@hidden:~$ cat cmds
>>>>>>   {'execute':'qmp_capabilities'}
>>>>>>   {'execute':'query-cpus'}
>>>>>>   address@hidden:~$ while true; do if socat stdio 
>>>>>> unix-connect:/tmp/qmp.sock
>>>>>>   <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
>>>>>>   echo ok; fi; done;
>>>>>>   ok
>>>>>>   ok
>>>>>>   failed
>>>>>>   address@hidden:~$
>>>>>>
>>>>>> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
>>>>>> hook, which gets called as part of the GIOChannel cb associated with the
>>>>>> client rather than a bottom-half, and is thus guaranteed to be delivered
>>>>>> prior to accepting any subsequent client sessions.
>>>>>>
>>>>>> This does not address the more general problem of whether or not there
>>>>>> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
>>>>>> client data, and whether or not we should consider moving 
>>>>>> CHR_EVENT_OPENED
>>>>>> "in-band" with connection establishment as a general solution, but fixes
>>>>>> QMP for the time being.
>>>>>>
>>>>>> Reported-by: Stefan Priebe <address@hidden>
>>>>>> Cc: address@hidden
>>>>>> Signed-off-by: Michael Roth <address@hidden>
>>>>>
>>>>> Thanks for debugging this Michael. I'm going to apply your patch to the 
>>>>> qmp
>>>>> branch because we can't let this broken (esp. in -stable) but this is 
>>>>> papering
>>>>> over the real bug...
>>>>
>>>> Do we really need OPENED to happen in a bottom half?  Shouldn't the open
>>>> event handlers register the callback instead if they need it?
>>>
>>> I think that's the more general fix, but wasn't sure if there were
>>> historical reasons why we didn't do this in the first place.
>>>
>>> Looking at it more closely it does seem like we need a general fix
>>> sooner rather than later though, and I don't see any reason why we can't
>>> move BH registration into the actual OPENED hooks as you suggest:
>>>
>>> hw/usb/ccid-card-passthru.c
>>>  - currently affected? no
>>>     debug hook only
>>>  - needs OPENED BH? no
>>>
>>> hw/usb/redirect.c
>>>  - currently affected? yes
>>>     can_read handler checks for dev->parser != NULL, which may be
>>>     true if CLOSED BH has been executed yet. In the past, OPENED
>>>     quiesced outstanding CLOSED events prior to us reading client data.
>>>  - need OPENED BH? possibly
>>>     seems to only be needed by CLOSED events, which can be issued by
>>>     USBDevice, so we do teardown/usb_detach in BH. For OPENED, this
>>>     may not apply. if we do issue a BH, we'd need to modify can_read
>>>     handler to avoid race.
>>>
>>> hw/usb/dev-serial.c
>>>  - currently affected? no
>>>     can_read checks for dev.attached != NULL. set to NULL
>>>     synchronously in CLOSED, will not accept reads until OPENED gets
>>>     called and sets dev.attached
>>>  - need opened BH?  no
>>>
>>> hw/char/sclpconsole.c
>>>  - currently affected? no
>>>     guarded by can_read handler
>>>  - need opened BH? no
>>>
>>> hw/char/ipoctal232.c
>>>  - currently affected? no
>>>     debug hook only
>>>  - need opened BH? no
>>>
>>> hw/char/virtio-console.c
>>>  - currently affected? no
>>>     OPENED/CLOSED map to vq events handled asynchronously. can_read
>>>     checks for guest_connected state rather than anything set by OPENED
>>>  - need opened BH? no
>>>
>>> qtest.c
>>>  - currently affected? yes
>>>     can_read handler doesn't check for qtest_opened == true, can read
>>>     data before OPENED event is processed
>>>  - need opened BH? no
>>>
>>> monitor.c
>>>  - current affected? yes
>>>     negotiated session state can temporarilly leak into a new session
>>>  - need opened BH? no
>>>
>>> gdbstub.c
>>>  - currently affected? yes
>>>     can fail to pause machine prior to starting gdb session
>>>  - need opened BH? no
>>>
>>> So we have a number of cases that can be fixed by avoiding the use of
>>> the generic BH, and only 1 possible cause where we might need a
>>> device-specific BH.
>>>
>>> At first glance anyway. So if this all seems reasonable I can send a
>>> more general fix shortly.
>>
>> Michael, I've dropped your first patch and am taking it that you're
>> going to cook this more general fix.
> 
> fix sent:
> 
> http://article.gmane.org/gmane.comp.emulators.qemu/213863

thanks seems to work fine

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]