qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] ide: Set BSY bit during FLUSH


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH] ide: Set BSY bit during FLUSH
Date: Tue, 28 May 2013 11:18:55 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 28.05.2013 um 10:46 hat Andreas Färber geschrieben:
> Am 28.05.2013 10:27, schrieb Kevin Wolf:
> > Am 28.05.2013 um 10:18 hat Andreas Färber geschrieben:
> >> The implementation of the ATA FLUSH command invokes a flush at the block
> >> layer, which may on raw files on POSIX entail a synchronous fdatasync().
> >> This may in some cases take so long that the SLES 11 SP1 guest driver
> >> reports I/O errors and filesystems get corrupted or remounted read-only.
> >>
> >> Avoid this by setting BUSY_STAT, so that the guest is made aware we are
> >> in the middle of an operation and no ATA commands are attempted to be
> >> processed concurrently.
> >>
> >> Addresses BNC#637297.
> >>
> >> Suggested-by: Gonglei (Arei) <address@hidden>
> >> Signed-off-by: Andreas Färber <address@hidden>
> >> ---
> >>  hw/ide/core.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> index c7a8041..bf1ff18 100644
> >> --- a/hw/ide/core.c
> >> +++ b/hw/ide/core.c
> >> @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret)
> >>  {
> >>      IDEState *s = opaque;
> >>  
> >> +    s->status &= ~BUSY_STAT;
> >> +
> > 
> > This part is unnecessary, the status is already reset.
> 
> Only in the ret >= 0 case though AFAICS?

ide_handle_rw_error() takes care of resetting the status as well, except
when the VM is stopped. But then it will be immediately set again when
the VM is continued and the request is restarted. So the semantic
difference is just whether BSY would be set or not when you somehow
inspect the state while the VM is stopped after an I/O error.

> >>      if (ret < 0) {
> >>          /* XXX: What sector number to set here? */
> >>          if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) {
> >> @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s)
> >>          return;
> >>      }
> >>  
> >> +    s->status |= BUSY_STAT;
> >>      bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH);
> >>      bdrv_aio_flush(s->bs, ide_flush_cb, s);
> >>  }
> > 
> > This should fix the bug, however in an one-off way. I was planning to
> > fix it by setting BSY for all commands and having an explicit command
> > completion everywhere. This part is a mess currently in IDE.
> 
> That's a valid idea, but I had backporting to 0.15 in mind. ;)
> And doh, I forgot qemu-stable.

Fair enough, we can merge something like this first and do the real
thing on top. Though nobody will be interested in the real thing any
more, as usual... :-/

> > The other part why I haven't sent a fix yet is that I don't have a test
> > case for it.
> 
> Temporarily add a sleep(31) in qemu_fdatasync()?
> 
> I was lazy in testing with -snapshot to not corrupt my disk image, which
> would not trigger the same issue since qcow2-backed AFAIU.
> 
> > I guess I need to extend blkdebug first before this can be
> > reliably tested by qtest.
> 
> It can't, since it's not a pure device emulation issue but depends on
> the relative timing of filesystem operations and subsequent commands.

That's why you need to take influence on the timing. It's no excuse for
merging without a test case. If we only ever tested devices that have no
relation to the outside world, our testing would be pretty useless and
always stay as bad as it is today in many areas.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]