qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Safely reopening image files by stashing fds


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Safely reopening image files by stashing fds
Date: Tue, 9 Aug 2011 13:00:56 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Aug 09, 2011 at 01:39:13PM +0200, Kevin Wolf wrote:
> Am 09.08.2011 12:56, schrieb Stefan Hajnoczi:
> > On Tue, Aug 9, 2011 at 11:50 AM, Stefan Hajnoczi <address@hidden> wrote:
> >> On Tue, Aug 9, 2011 at 11:35 AM, Kevin Wolf <address@hidden> wrote:
> >>> Am 09.08.2011 12:25, schrieb Stefan Hajnoczi:
> >>>> On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf <address@hidden> wrote:
> >>>>> Am 08.08.2011 16:49, schrieb Stefan Hajnoczi:
> >>>>>> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <address@hidden> wrote:
> >>>>>>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi:
> >>>>>>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <address@hidden> wrote:
> >>>>>>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi:
> >>>>>>>>>> We've discussed safe methods for reopening image files (e.g. 
> >>>>>>>>>> useful for
> >>>>>>>>>> changing the hostcache parameter).  The problem is that closing 
> >>>>>>>>>> the file first
> >>>>>>>>>> and then opening it again exposes us to the error case where the 
> >>>>>>>>>> open fails.
> >>>>>>>>>> At that point we cannot get to the file anymore and our options 
> >>>>>>>>>> are to
> >>>>>>>>>> terminate QEMU, pause the VM, or offline the block device.
> >>>>>>>>>>
> >>>>>>>>>> This window of vulnerability can be eliminated by keeping the file 
> >>>>>>>>>> descriptor
> >>>>>>>>>> around and falling back to it should the open fail.
> >>>>>>>>>>
> >>>>>>>>>> The challenge for the file descriptor approach is that image 
> >>>>>>>>>> formats, like
> >>>>>>>>>> VMDK, can span multiple files.  Therefore the solution is not as 
> >>>>>>>>>> simple as
> >>>>>>>>>> stashing a single file descriptor and reopening from it.
> >>>>>>>>>
> >>>>>>>>> So far I agree. The rest I believe is wrong because you can't assume
> >>>>>>>>> that every backend uses file descriptors. The qemu block layer is 
> >>>>>>>>> based
> >>>>>>>>> on BlockDriverStates, not fds. They are a concept that should be 
> >>>>>>>>> hidden
> >>>>>>>>> in raw-posix.
> >>>>>>>>>
> >>>>>>>>> I think something like this could do:
> >>>>>>>>>
> >>>>>>>>> struct BDRVReopenState {
> >>>>>>>>>    BlockDriverState *bs;
> >>>>>>>>>    /* can be extended by block drivers */
> >>>>>>>>> };
> >>>>>>>>>
> >>>>>>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state, 
> >>>>>>>>> int
> >>>>>>>>> flags);
> >>>>>>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state);
> >>>>>>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state);
> >>>>>>>>>
> >>>>>>>>> raw-posix would store the old file descriptor in its reopen_state. 
> >>>>>>>>> On
> >>>>>>>>> commit, it closes the old descriptors, on abort it reverts to the 
> >>>>>>>>> old
> >>>>>>>>> one and closes the newly opened one.
> >>>>>>>>>
> >>>>>>>>> Makes things a bit more complicated than the simple bdrv_reopen I 
> >>>>>>>>> had in
> >>>>>>>>> mind before, but it allows VMDK to get an all-or-nothing semantics.
> >>>>>>>>
> >>>>>>>> Can you show how bdrv_reopen() would use these new interfaces?  I'm
> >>>>>>>> not 100% clear on the idea.
> >>>>>>>
> >>>>>>> Well, you wouldn't only call bdrv_reopen, but also either
> >>>>>>> bdrv_reopen_commit/abort (for the top-level caller we can have a 
> >>>>>>> wrapper
> >>>>>>> function that does both, but that's syntactic sugar).
> >>>>>>>
> >>>>>>> For example we would have:
> >>>>>>>
> >>>>>>> int vmdk_reopen()
> >>>>>>
> >>>>>> .bdrv_reopen() is a confusing name for this operation because it does
> >>>>>> not reopen anything.  bdrv_prepare_reopen() might be clearer.
> >>>>>
> >>>>> Makes sense.
> >>>>>
> >>>>>>
> >>>>>>> {
> >>>>>>>    *((VMDKReopenState**) rs) = malloc();
> >>>>>>>
> >>>>>>>    foreach (extent in s->extents) {
> >>>>>>>        ret = bdrv_reopen(extent->file, &extent->reopen_state)
> >>>>>>>        if (ret < 0)
> >>>>>>>            goto fail;
> >>>>>>>    }
> >>>>>>>    return 0;
> >>>>>>>
> >>>>>>> fail:
> >>>>>>>    foreach (extent in rs->already_reopened) {
> >>>>>>>        bdrv_reopen_abort(extent->reopen_state);
> >>>>>>>    }
> >>>>>>>    return ret;
> >>>>>>> }
> >>>>>>
> >>>>>>> void vmdk_reopen_commit()
> >>>>>>> {
> >>>>>>>    foreach (extent in s->extents) {
> >>>>>>>        bdrv_reopen_commit(extent->reopen_state);
> >>>>>>>    }
> >>>>>>>    free(rs);
> >>>>>>> }
> >>>>>>>
> >>>>>>> void vmdk_reopen_abort()
> >>>>>>> {
> >>>>>>>    foreach (extent in s->extents) {
> >>>>>>>        bdrv_reopen_abort(extent->reopen_state);
> >>>>>>>    }
> >>>>>>>    free(rs);
> >>>>>>> }
> >>>>>>
> >>>>>> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs,
> >>>>>> &rs)?
> >>>>>
> >>>>> No. Closing the old backend would be part of bdrv_reopen_commit.
> >>>>>
> >>>>> Do you have a use case where it would be helpful if the caller invoked
> >>>>> bdrv_close?
> >>>>
> >>>> When the caller does bdrv_close() two BlockDriverStates are never open
> >>>> for the same image file.  I thought this was a property we wanted.
> >>>>
> >>>> Also, in the block_set_hostcache case we need to reopen without
> >>>> switching to a new BlockDriverState instance.  That means the reopen
> >>>> needs to be in-place with respect to the BlockDriverState *bs pointer.
> >>>>  We cannot create a new instance.
> >>>
> >>> Yes, but where do you even get the second BlockDriverState from?
> >>>
> >>> My prototype only returns an int, not a new BlockDriverState. Until
> >>> bdrv_reopen_commit() it would refer to the old file descriptors etc. and
> >>> after bdrv_reopen_commit() the very same BlockDriverState would refer to
> >>> the new ones.
> >>
> >> It seems I don't understand the API.  I thought it was:
> >>
> >> do_block_set_hostcache()
> >> {
> >>    bdrv_prepare_reopen(bs, &rs);
> >>    ...open new file and check everything is okay...
> >>    if (ret == 0) {
> >>        bdrv_reopen_commit(rs);
> >>    } else {
> >>        bdrv_reopen_abort(rs);
> >>    }
> >>    return ret;
> >> }
> >>
> >> If the caller isn't opening the new file then what's the point of
> >> giving the caller control over prepare, commit, and abort?
> > 
> > After sending the last email I realized what I was missing:
> > 
> > You need the prepare, commit, and abort API in order to handle
> > multi-file block drivers like VMDK.
> 
> Yes, this is whole point of separating commit out. Does the proposal
> make sense to you now?

It depends on the details.  Adding more functions that every BlockDriver
must implement is bad, so it's important that we only drop this
functionality into raw-posix.c, vmdk.c, and block.c as appropriate.

I liked the idea of doing a generic FDStash type that the monitor and
bdrv_reopen() can use.  Blue's idea to hook at the qemu_open() level
takes that further.

But if we can do prepare, commit, and abort in a relatively simple way
then I'm for it.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]