qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Safely reopening image files by stashing fds


From: Kevin Wolf
Subject: Re: [Qemu-devel] Safely reopening image files by stashing fds
Date: Wed, 10 Aug 2011 09:58:15 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0

Am 09.08.2011 21:39, schrieb Blue Swirl:
> On Tue, Aug 9, 2011 at 12:24 PM, Kevin Wolf <address@hidden> wrote:
>> Am 09.08.2011 14:00, schrieb Stefan Hajnoczi:
>>> On Tue, Aug 09, 2011 at 01:39:13PM +0200, Kevin Wolf wrote:
>>>> Am 09.08.2011 12:56, schrieb Stefan Hajnoczi:
>>>>> On Tue, Aug 9, 2011 at 11:50 AM, Stefan Hajnoczi <address@hidden> wrote:
>>>>>> On Tue, Aug 9, 2011 at 11:35 AM, Kevin Wolf <address@hidden> wrote:
>>>>>>> Am 09.08.2011 12:25, schrieb Stefan Hajnoczi:
>>>>>>>> On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf <address@hidden> wrote:
>>>>>>>>> Am 08.08.2011 16:49, schrieb Stefan Hajnoczi:
>>>>>>>>>> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <address@hidden> wrote:
>>>>>>>>>>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi:
>>>>>>>>>>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <address@hidden> wrote:
>>>>>>>>>>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi:
>>>>>>>>>>>>>> We've discussed safe methods for reopening image files (e.g. 
>>>>>>>>>>>>>> useful for
>>>>>>>>>>>>>> changing the hostcache parameter).  The problem is that closing 
>>>>>>>>>>>>>> the file first
>>>>>>>>>>>>>> and then opening it again exposes us to the error case where the 
>>>>>>>>>>>>>> open fails.
>>>>>>>>>>>>>> At that point we cannot get to the file anymore and our options 
>>>>>>>>>>>>>> are to
>>>>>>>>>>>>>> terminate QEMU, pause the VM, or offline the block device.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This window of vulnerability can be eliminated by keeping the 
>>>>>>>>>>>>>> file descriptor
>>>>>>>>>>>>>> around and falling back to it should the open fail.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The challenge for the file descriptor approach is that image 
>>>>>>>>>>>>>> formats, like
>>>>>>>>>>>>>> VMDK, can span multiple files.  Therefore the solution is not as 
>>>>>>>>>>>>>> simple as
>>>>>>>>>>>>>> stashing a single file descriptor and reopening from it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So far I agree. The rest I believe is wrong because you can't 
>>>>>>>>>>>>> assume
>>>>>>>>>>>>> that every backend uses file descriptors. The qemu block layer is 
>>>>>>>>>>>>> based
>>>>>>>>>>>>> on BlockDriverStates, not fds. They are a concept that should be 
>>>>>>>>>>>>> hidden
>>>>>>>>>>>>> in raw-posix.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think something like this could do:
>>>>>>>>>>>>>
>>>>>>>>>>>>> struct BDRVReopenState {
>>>>>>>>>>>>>    BlockDriverState *bs;
>>>>>>>>>>>>>    /* can be extended by block drivers */
>>>>>>>>>>>>> };
>>>>>>>>>>>>>
>>>>>>>>>>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState 
>>>>>>>>>>>>> **reopen_state, int
>>>>>>>>>>>>> flags);
>>>>>>>>>>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state);
>>>>>>>>>>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state);
>>>>>>>>>>>>>
>>>>>>>>>>>>> raw-posix would store the old file descriptor in its 
>>>>>>>>>>>>> reopen_state. On
>>>>>>>>>>>>> commit, it closes the old descriptors, on abort it reverts to the 
>>>>>>>>>>>>> old
>>>>>>>>>>>>> one and closes the newly opened one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Makes things a bit more complicated than the simple bdrv_reopen I 
>>>>>>>>>>>>> had in
>>>>>>>>>>>>> mind before, but it allows VMDK to get an all-or-nothing 
>>>>>>>>>>>>> semantics.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you show how bdrv_reopen() would use these new interfaces?  I'm
>>>>>>>>>>>> not 100% clear on the idea.
>>>>>>>>>>>
>>>>>>>>>>> Well, you wouldn't only call bdrv_reopen, but also either
>>>>>>>>>>> bdrv_reopen_commit/abort (for the top-level caller we can have a 
>>>>>>>>>>> wrapper
>>>>>>>>>>> function that does both, but that's syntactic sugar).
>>>>>>>>>>>
>>>>>>>>>>> For example we would have:
>>>>>>>>>>>
>>>>>>>>>>> int vmdk_reopen()
>>>>>>>>>>
>>>>>>>>>> .bdrv_reopen() is a confusing name for this operation because it does
>>>>>>>>>> not reopen anything.  bdrv_prepare_reopen() might be clearer.
>>>>>>>>>
>>>>>>>>> Makes sense.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>>    *((VMDKReopenState**) rs) = malloc();
>>>>>>>>>>>
>>>>>>>>>>>    foreach (extent in s->extents) {
>>>>>>>>>>>        ret = bdrv_reopen(extent->file, &extent->reopen_state)
>>>>>>>>>>>        if (ret < 0)
>>>>>>>>>>>            goto fail;
>>>>>>>>>>>    }
>>>>>>>>>>>    return 0;
>>>>>>>>>>>
>>>>>>>>>>> fail:
>>>>>>>>>>>    foreach (extent in rs->already_reopened) {
>>>>>>>>>>>        bdrv_reopen_abort(extent->reopen_state);
>>>>>>>>>>>    }
>>>>>>>>>>>    return ret;
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>> void vmdk_reopen_commit()
>>>>>>>>>>> {
>>>>>>>>>>>    foreach (extent in s->extents) {
>>>>>>>>>>>        bdrv_reopen_commit(extent->reopen_state);
>>>>>>>>>>>    }
>>>>>>>>>>>    free(rs);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> void vmdk_reopen_abort()
>>>>>>>>>>> {
>>>>>>>>>>>    foreach (extent in s->extents) {
>>>>>>>>>>>        bdrv_reopen_abort(extent->reopen_state);
>>>>>>>>>>>    }
>>>>>>>>>>>    free(rs);
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs,
>>>>>>>>>> &rs)?
>>>>>>>>>
>>>>>>>>> No. Closing the old backend would be part of bdrv_reopen_commit.
>>>>>>>>>
>>>>>>>>> Do you have a use case where it would be helpful if the caller invoked
>>>>>>>>> bdrv_close?
>>>>>>>>
>>>>>>>> When the caller does bdrv_close() two BlockDriverStates are never open
>>>>>>>> for the same image file.  I thought this was a property we wanted.
>>>>>>>>
>>>>>>>> Also, in the block_set_hostcache case we need to reopen without
>>>>>>>> switching to a new BlockDriverState instance.  That means the reopen
>>>>>>>> needs to be in-place with respect to the BlockDriverState *bs pointer.
>>>>>>>>  We cannot create a new instance.
>>>>>>>
>>>>>>> Yes, but where do you even get the second BlockDriverState from?
>>>>>>>
>>>>>>> My prototype only returns an int, not a new BlockDriverState. Until
>>>>>>> bdrv_reopen_commit() it would refer to the old file descriptors etc. and
>>>>>>> after bdrv_reopen_commit() the very same BlockDriverState would refer to
>>>>>>> the new ones.
>>>>>>
>>>>>> It seems I don't understand the API.  I thought it was:
>>>>>>
>>>>>> do_block_set_hostcache()
>>>>>> {
>>>>>>    bdrv_prepare_reopen(bs, &rs);
>>>>>>    ...open new file and check everything is okay...
>>>>>>    if (ret == 0) {
>>>>>>        bdrv_reopen_commit(rs);
>>>>>>    } else {
>>>>>>        bdrv_reopen_abort(rs);
>>>>>>    }
>>>>>>    return ret;
>>>>>> }
>>>>>>
>>>>>> If the caller isn't opening the new file then what's the point of
>>>>>> giving the caller control over prepare, commit, and abort?
>>>>>
>>>>> After sending the last email I realized what I was missing:
>>>>>
>>>>> You need the prepare, commit, and abort API in order to handle
>>>>> multi-file block drivers like VMDK.
>>>>
>>>> Yes, this is whole point of separating commit out. Does the proposal
>>>> make sense to you now?
>>>
>>> It depends on the details.  Adding more functions that every BlockDriver
>>> must implement is bad, so it's important that we only drop this
>>> functionality into raw-posix.c, vmdk.c, and block.c as appropriate.
>>
>> Yes, I agree.
>>
>>> I liked the idea of doing a generic FDStash type that the monitor and
>>> bdrv_reopen() can use.  Blue's idea to hook at the qemu_open() level
>>> takes that further.
>>
>> Well, having something that works for the raw-posix, the monitor and
>> maybe some more things is nice. Having something that works for
>> raw-posix, Sheepdog and rbd is an absolute requirement and I can't see
>> how FDStash solves that.
> 
> For Sheepdog also network access functions would need to be hooked.
> RBD seems to use librados functions for low level I/O, so that needs
> some RBD specific wrappers.
> 
>> Even raw-win32 doesn't have an int fd, but a
>> HANDLE hfile for its backend.
> 
> Just replace "int fd" with "FDStash fd" everywhere?

And how is FDStash defined? The current proposed is this one:

/* A stashed file descriptor */
typedef FDEntry {
        const char *name;
        int fd;
        QLIST_ENTRY(FDEntry) next;
} FDEntry;

/* A container for stashing file descriptors */
typedef struct FDStash {
        QLIST_HEAD(, FDEntry) fds;
} FDStash;

So there we have the int fd again. If you want something generic, you
would have to start letting block drivers extend FDEntry with their own
fields (and how would the first bdrv_open work then?) and basically
arrive at something like a BDRVReopenState.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]