qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Possible bug in monitor code


From: Luiz Capitulino
Subject: Re: [Qemu-devel] Possible bug in monitor code
Date: Fri, 24 Jan 2014 09:14:39 -0500

On Fri, 24 Jan 2014 12:14:12 +0200
Stratos Psomadakis <address@hidden> wrote:

> On 01/23/2014 08:28 PM, Luiz Capitulino wrote:
> > On Thu, 23 Jan 2014 17:33:33 +0200
> > Stratos Psomadakis <address@hidden> wrote:
> >
> >> On 01/23/2014 03:54 PM, Luiz Capitulino wrote:
> >>> On Thu, 23 Jan 2014 08:44:02 -0500
> >>> Luiz Capitulino <address@hidden> wrote:
> >>>
> >>>> On Thu, 23 Jan 2014 19:23:51 +0800
> >>>> Fam Zheng <address@hidden> wrote:
> >>>>
> >>>>> Bcc: 
> >>>>> Subject: Re: [Qemu-devel] Possible bug in monitor code
> >>>>> Reply-To: 
> >>>>> In-Reply-To: <address@hidden>
> >>>>>
> >>>>> On Thu, 01/23 12:17, Stratos Psomadakis wrote:
> >>>>>> On 01/23/2014 05:07 AM, Fam Zheng wrote:
> >>>>>>> On Wed, 01/22 17:53, Stratos Psomadakis wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> we've encountered a weird issue regarding monitor (qmp and hmp) 
> >>>>>>>> behavior
> >>>>>>>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the 
> >>>>>>>> issue:
> >>>>>>>>
> >>>>>>>>     1) Client A connects to qmp socket with socat
> >>>>>>>>     2) Client A gets greeting message {"QMP": {"version": ..}
> >>>>>>>>     3) Client A waits (select on the socket's fd)
> >>>>>>>>     4) Client B tries to connect to the *same* qmp socket with socat
> >>>> QMP/HMP can only handle a single client per connection, so this is
> >>>> not supported. What you could do is to create multiple QMP/HMP instances
> >>>> on the command-line, but this has never been fully tested so we don't
> >>>> officially support this either (although it may work).
> >>>>
> >>>> Now, not gracefully failing on step 4 is a real bug here. I think the
> >>>> best thing to do would be to close client's B connection. Patches are
> >>>> welcome :)
> >>> Thinking about this again, I think this should already be the case
> >>> (as I don't believe chardev code is sitting on accept()). So maybe
> >>> you triggered a race on chardev code or broke something there.
> >> Yes, the chardev code doesn't accept any more connections until the
> >> currently active connection is closed.
> >>
> >> However, when a new client tries to connect (while there's another qmp /
> >> hmp connection active) the kernel, as long as there's room in the socket
> >> backlog, will return to the client that the connection has been
> >> successfully established and will queue the connection to be accepted
> >> later, when qmp actually finishes with its active connection and
> >> re-executes accept().
> >>
> >> The problem arises if the new client closes the connection before the
> >> older one is finished. When qmp runs accept() again, it will get a
> >> socket fd for a client who has now closed the connection. At this point,
> >> the monitor control event handler will get triggered and try to write /
> >> flush the greeting message to the client. The client however has closed
> >> its socket so the write will fail with SIGPIPE / EPIPE. Neither the
> >> qemu-char nor the monitor code seem to handle this situation.
> >>
> >> Btw, have you tried to reproduce it?
> > Not yet, I may have some time tomorrow. How reproducible is it for you?
> 
> We can trigger it (by following the steps described in the first mail)
> consistently.
> 
> > Another question: have you tried to reproduce with an old qemu version
> > (say v1.0) to see if this bug always existed? If the bug was introduced
> > in some recent QEMU version you could try to bisect it.
> 
> v1.1 is not affected. I checked the code and it seems the monitor code
> has been refactored since v1.1.
> 
> > Maybe you could try to reproduce with a different subsystem so that we
> > can rule out or confirm monitor's involvement? Like -serial?
> 
> It's actually a fault of the monitor_flush() function. As far as I can
> understand, monitor_flush() calls qemu_chr_fe_write() and doesn't handle
> all of the return codes / error cases properly (as I described in a
> previous mail). If you check the function, you'll see that the final
> case (where it set ups a watch / callback) always assumes an EAGAIN /
> EWOULDBLOCK error.
> 
> If you can verify / confirm that this is the case and that the patch
> sent resolves the issue in a sane / correct way, I'll resubmit it
> properly (with git-format-patch, a git log msg etc).

I'll check that later today.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]