qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps


From: John Snow
Subject: Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
Date: Tue, 23 Jun 2015 20:21:20 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0


On 06/17/2015 10:29 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 12.06.2015 22:34, John Snow wrote:
>>
>> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> v2:
>>>   - rebase on my 'Dirty bitmaps migration' series
>>>   - remove 'print dirty bitmap', 'query-dirty-bitmap' and use md5 for
>>>     testing like with dirty bitmaps migration
>>>   - autoclean features
>>>
>>> v1:
>>>
>>> The bitmaps are saved into qcow2 file format. It provides both
>>> 'internal' and 'external' dirty bitmaps feature:
>>>   - for qcow2 drives we can store bitmaps in the same file
>>>   - for other formats we can store bitmaps in the separate qcow2 file
>>>
>>> QCow2 header is extended by fields 'nb_dirty_bitmaps' and
>>> 'dirty_bitmaps_offset' like with snapshots.
>>>
>>> Proposed command line syntax is the following:
>>>
>>> -dirty-bitmap [option1=val1][,option2=val2]...
>>>      Available options are:
>>>      name         The name for the bitmap (necessary).
>>>
>>>      file         The file to load the bitmap from.
>>>
>>>      file_id      When specified with 'file' option, then this file will
>>>                   be available through this id for other -dirty-bitmap
>>>                   options when specified without 'file' option, then it
>>>                   is a reference to 'file', specified with another
>>>                   -dirty-bitmap option, and it will be used to load the
>>>                   bitmap from.
>>>
>>>      drive        The drive to bind the bitmap to. It should be
>>> specified
>>>                   as 'id' suboption of one of -drive options. If nor
>>>                   'file' neither 'file_id' are specified, then the
>>> bitmap
>>>                   will be loaded from that drive (internal dirty
>>> bitmap).
>>>
>>>      granularity  The granularity for the bitmap. Not necessary, the
>>>                   default value may be used.
>>>
>>>      enabled      on|off. Default is 'on'. Disabled bitmaps are not
>>>                   changing regardless of writes to corresponding drive.
>>>
>>> Examples:
>>>
>>> qemu -drive file=a.qcow2,id=disk -dirty-bitmap name=b,drive=disk
>>> qemu -drive file=a.raw,id=disk \
>>>       -dirty-bitmap name=b,drive=disk,file=b.qcow2,enabled=off
>>>
>>> Vladimir Sementsov-Ogievskiy (8):
>>>    spec: add qcow2-dirty-bitmaps specification
>>>    qcow2: add dirty-bitmaps feature
>>>    block: store persistent dirty bitmaps
>>>    block: add bdrv_load_dirty_bitmap
>>>    qcow2: add qcow2_dirty_bitmap_delete_all
>>>    qcow2: add autoclear bit for dirty bitmaps
>>>    qemu: command line option for dirty bitmaps
>>>    iotests: test internal persistent dirty bitmap
>>>
>>>   block.c                       |  82 +++++++
>>>   block/Makefile.objs           |   2 +-
>>>   block/qcow2-dirty-bitmap.c    | 537
>>> ++++++++++++++++++++++++++++++++++++++++++
>>>   block/qcow2.c                 |  69 +++++-
>>>   block/qcow2.h                 |  61 +++++
>>>   blockdev.c                    |  38 +++
>>>   docs/specs/qcow2.txt          |  66 ++++++
>>>   include/block/block.h         |   9 +
>>>   include/block/block_int.h     |  10 +
>>>   include/sysemu/blockdev.h     |   1 +
>>>   include/sysemu/sysemu.h       |   1 +
>>>   qemu-options.hx               |  37 +++
>>>   tests/qemu-iotests/118        |  83 +++++++
>>>   tests/qemu-iotests/118.out    |   5 +
>>>   tests/qemu-iotests/group      |   1 +
>>>   tests/qemu-iotests/iotests.py |   6 +
>>>   vl.c                          | 100 ++++++++
>>>   17 files changed, 1105 insertions(+), 3 deletions(-)
>>>   create mode 100644 block/qcow2-dirty-bitmap.c
>>>   create mode 100755 tests/qemu-iotests/118
>>>   create mode 100644 tests/qemu-iotests/118.out
>>>
>> Well, you said "RFC" ... So here's some "C" that you RF'd.
>>
>> Many of these points are a "wish list" of sorts and don't necessarily
>> have to be implemented all at once, but we should be careful to design
>> the core series with the later additions in mind.
>>
>> Many of these items are things that I wouldn't mind working on
>> (Primarily the QMP interfaces), provided that the core of this series
>> will allow for them to exist. I can take many of the QMP/transaction
>> interface projects, for instance.
>>
>> I'm starting to think we won't be able to squeeze this in for 2.4, but
>> we can have a bulk of the work well underway for 2.5, by which point I
>> am hopeful that libvirt will be beginning to pick up motion for
>> integration of this feature.
>>
>> I think that the basic approach you have so far is good, we just have to
>> plan out our required extensions and then we can review the base to make
>> sure it supports the features we want in the near future.
>>
>>
>> (1) General storage design
>>
>> - Persistence bitmaps can be stored in any arbitrary qcow2 file,
>> regardless of if that qcow2 holds data or not.
>>
>> - Any given qcow2 file with or without data can hold bitmaps intended
>> for any number of other drives.
>
> Actually, dirty bitmap is not bound to the image, it just have a name,
> identifying it. We can (try to) load any bitmap for any image.
>

I'm not sure what you mean by "bound" here, but yes, as it stands: the
design is very flexible and I like that. It appears that bitmaps for any
number of images can be simultaneously stored in a .qcow2 as a generic
container, so it's a very flexible approach.

I didn't mean to imply that you couldn't do that already, because it
looks like you can.

>>
>> - Dirty bitmaps are not assumed to be able to be stored in any
>> particular location.
>>
>> So far, this is good. I like the flexibility this provides. This lets us
>> do all kinds of cool things like store bitmaps for 20 different raw
>> drives inside of a single 'bitmaps.qcow2' if we wish.
>>
>>
>> (2) Bitmaps added via QMP do not get any persistence attributes.
>>
>> This is something we'll need to change. Existing QMP commands that let
>> us modify bitmaps:
>>
>> block-dirty-bitmap-add        [+transaction]
>> block-dirty-bitmap-remove
>> block-dirty-bitmap-clear    [+transaction]
>>
>> - block-dirty-bitmap-add:
>>
>> We will want the ability for bitmap-add to specify a persistence option.
>> What I am less clear on is what this attribute should look like.
>>
>> should we add target: <filename> as an attribute here,
>> or should it be target: <node> to specify the file object that we want
>> to store this bitmap in? Or perhaps both?:
>>
>> mode: file, target: <filename>
>> mode: node, target: <node>
>>
>> Or even an explicit usability feature that lets us specify that we wish
>> to store the bitmap for the drive we're attaching it to:
>>
>> block-dirty-bitmap=add node=drive0 name=bitmap0 mode=self
>>
>> The implication here is that the default value for persist could be
>> "none", which does not attempt to store this bitmap anywhere.
>>
>> - block-dirty-bitmap-remove
>>
>> If we remove a bitmap with persistence options active, it needs to be
>> cleared out of the file it is being stored in. Currently we use
>> "release" to remove a bitmap, which deletes only the in-memory portion
>> of the bitmap, so you also use release in your series to delete
>> in-memory bitmaps after we're done with them.
>>
>> I think the semantics of the "remove" QMP option here, however, should
>> include a call to the storage layer to remove the bitmap in question.
>>
>> Let's split the "release" function into two functions:
>> (A) bdrv_dirty_bitmap_free (which just frees the in-memory bits)
>> (B) bdrv_dirty_bitmap_delete (which relies on _free but deletes from
>> disk also.)
>>
>> Then bdrv_close can use bitmap_free, but the QMP remove command can
>> utilize _delete.
>>
>> - block-dirty-bitmap-clear:
>>
>> This needs to clear the bitmap on-disk if it has persistence features
>> active.
> 
> Does it? When the bitmap is loaded, its representation on disk is
> inconsistent, and an in_use bit is set (on disk). So, we don't need to
> sync it here.
> Syncing on 'remove' is not necessary for the same reason, but may take
> place to not store extra trash..
> 

Keeping an in-use bit is definitely a way to accommodate this QMP
command, so you're right, we don't need to pay special attention here --
unless we go with some kind of a periodic flush model, at which point if
the bitmap is "clean," this command will need to re-mark it as dirty.

Just a generic ->mark_bitmap_dirty() op to call here would suffice entirely.

>>
>> - block-dirty-bitmap-copy:
>>
>> This is only a proposal currently, but worth us keeping it in mind. We
>> should decide on copy semantics. Should the copy keep the persistence
>> attributes of the source bitmap by default and allow a user to override
>> it if desired, or should we force the persistence attribute back to
>> null/None until the user overrides?
>>
>> I suspect defaulting it to no persistence is probably the sanest until
>> we're told otherwise (either via an extension to the copy command or a
>> later edit command.)
>>
>> Since the QMP interfaces has been my area so far, I can draft their
>> addition as a new series if you'd like.
> 
> ok.
> 
>>
>>
>> (3) Additional QMP interfaces
>>
>> We should add the ability to modify a bitmap's persistence after it has
>> been added.
>>
>> block-dirty-bitmap-edit mode=<file,node,self,none> target=<...>
>>
>> This will allow us to add persistence to a bitmap after creation, or
>> remove persistence from a bitmap without deleting it if it's no longer
>> desired.
>>
>> Perhaps at a later date we could even have it change where the bitmap is
>> stored through this mechanism.
>>
>> (Usability features might include the ability for us to rename or change
>> the granularity of the bitmap, too -- but that's future usability stuff,
>> not core functionality.)
>>
>> Like the above, I can draft this addition.
> 
> no objections)
> 
>>
>>
>> (4) Storage Format
>>
>> I think overall the bitmap extension headers look sane, but Kevin is the
>> ultimate authority here.
>>
>> I /would/ like to see an additional header bitfield reserved
>> for some arbitrary flags that can be used at a later date. A uint32_t
>> should be sufficient for now, with some of the upper bits reserved
>> either for an extension or a version field to allow us to expand the
>> bitmap headers in the future if necessary.
> 
> ok
> 
>>
>>
>> (5) Bitmap autoloading
>>
>> Bitmaps are not currently automatically loaded if you pass e.g. (-hda
>> my_drive_that_also_has_bitmaps.qcow2). This is in part because the drive
>> a bitmap was intended for is not information stored with the bitmap, so
>> QEMU has no concept or ability to be able to "auto load" bitmaps.
>>
>> Hinted at earlier by my desire to see something like mode=self, we
>> should add some flags to the dirty bitmap header stored with each bitmap:
>>
>> 0x01: "This bitmap describes the file it is stored in"
>> 0x02: "This bitmap should be auto-loaded when this file is opened."
>> 0x04: "This bitmap is read-only (disabled.)"
> 
> The last one - should it be used only for auto-loading bitmaps?
> 

Not necessarily, I just lumped it in here as an example to be grouped
near the other flags I thought we needed. Maybe we don't actually need a
read only flag to be stored because there's not currently a use-case for
RO bitmaps outside of migration.

Just a passing thought.

>>
>> This way, with a properly modern version of QEMU, you could simply just:
>>
>> qemu -M q35 -enable-kvm -hda windows10.qcow2
>>
>> and if there were bitmaps inside of windows10.qcow2 that had 0x01 and
>> 0x02 set, you'd get those bitmaps loaded before any IO to the data
>> clusters of the .qcow2, ensuring data integrity.
>>
>> Of course, I think that it is currently too complicated to try to
>> accomplish autoloading of bitmaps for *other* drives, so let's not worry
>> about that now. This means 0x02 set without 0x01 would be an error.
>>
>> Of course, when autoloading bitmaps, we'll have to check that the size
>> of the bitmap matches the size of the drive. This is easy to do, though.
> 
> it is always checked)
> 

You're right. I think I wasn't convinced we needed size (etc) to be part
of the lookup process, but the way you have it now it does always check
the sizes. Just thinking out loud again.

>>
>> The 0x01 bit can be set automatically when that circumstance is
>> detected, and 0x02 can be set perhaps as an option to
>> --dirty-bitmap auto=yes
>> or via the QMP
>> block-dirty-bitmap-add ... auto=yes
>> or via the edit command,
>> block-dirty-bitmap-edit ... auto=yes
>>
>> Maybe we could also set it implicitly if mode=self is used, too.
> 
> Also, for auto-loading bitmaps, user can manually load it (changing
> 'disabled' bit). And in this case auto-loading should be skipped.
> Also, if auto-loading is default behavior, than what about
> --disable-bitmap-autoloading or something like this?
> 

Agreed. Whatever the default is, we need a way to turn it off and be
explicit about it. Perhaps as an argument to -drive?

e.g.

-drive if=none,file=linux-and-bitmaps.qcow2,bitmap=<auto,no>

where "no" would be a very explicit "Do not load any of the bitmaps in
this file."

"auto" would load automatically any of the bitmaps stored there with the
auto/self flags set, and skip the rest otherwise.

Maybe this is serviceable.

>>
>> (6) qemu-img interface
>>
>> Stefan has mentioned that it would be nice to implement a query ability
>> to qemu-img to list bitmaps stored in qcow2 files, along with some of
>> their key attributes. size, granularity, any flags. It's probably not
>> efficient to list the dirty count, unless we begin storing that
>> information manually in the header. I don't think there's a strong need
>> for that level of info, though.
>>
>> I can handle this part, if you'd like.
> 
> can qmp query block with information about bitmaps be reused here?
> 

I don't think so. Here we'd be reading the bitmaps on-disk and reporting
the info stored in-file, instead of the in-memory structures.

>>
>> (7) CLI interface
>>
>> - The only way to get a bitmap loaded into memory from file is to use
>> the --dirty-bitmap argument where you specify the name, file,
>> destination drive, and granularity.
>>
>> - The only way to create a new bitmap that will integrate with the
>> persistence features is to specify a new bitmap that does not currently
>> exist within a file and allow the qcow2 layer to create the in-memory
>> bitmap for us.
>>
>> This helps us with the flexibility that makes this design a winning
>> choice overall, but it's cumbersome for some special common cases I
>> think we should be supporting.
>>
>> As mentioned previously, I think granularity should not be part
>> of the lookup process -- just creation, and even then I think this CLI
>> syntax should not automatically create bitmaps if it wasn't found -- if
>> the user didn't intend to make a bitmap, an error is likely more
>> appropriate.
>>
>> Perhaps --dirty-bitmap create=true,[...] would be sufficient for
>> specifying intent here, at which point granularity makes sense for the
>> creation process.
>>
>> As for the granularity, I think this should be appropriate:
>>
>> --dirty-bitmap file=bitmaps.qcow2,name=bitmap0,drive=drive0
>>
>> And that should be sufficient to look in bitmaps.qcow2, find 'bitmap0',
>> and attach it to 'drive0', throwing an error if the sizes don't match.
> 
> agree, I will do it
> 

Great! I promise I do like the series overall even if I had a
book-length comment about it :)

>>
>> (8) Namespaces
>>
>> Stefan also asked me about the bitmap namespaces -- in-memory of course,
>> each node can have their own "bitmap0" without any collisions because
>> all bitmaps are always referred to by their (bs,name) pair.
>>
>> How do we address bitmaps inside a file, though?
>>
>> If any given bitmap containing .qcow2 file can store an arbitrary number
>> of bitmaps intended for an arbitrary number of destinations, how do we
>> handle this?
>>
>> EXAMPLE:
>> -dirty-bitmap name=bitmap0,drive=drive0,file=bitmaps.qcow2
>> -dirty-bitmap name=bitmap0,drive=drive1,file=bitmaps.qcow2
>>
>> I think this might currently do very funky things, if bitmaps.qcow2 is
>> currently empty -- I think both calls will succeed, but it will fail
>> later when it tries to store them and cannot.
>>
>> I think we need to do one of two things:
>>
>> (A) Keep the namespace inside of a .qcow2 file as it is now, but ALWAYS
>> check up front if a bitmap *can* be added to the file. This way we don't
>> run into problems after we've dirtied the bitmap.
>>

To clarify, I meant "every bitmap name inside of a file is unique. Check
to make sure it is possible to store a new bitmap upon its creation."

>> (B) Find a way to accommodate bitmaps with the same names that were
>> intended for different nodes.
>>
>> I don't have a good idea for #2, so I think #1 is probably the way to
>> go. We can amend the bitmap documentation to specify that although the
>> bitmap names are unique per-node, if you want to store them in the same
>> file, you're going to want to give them globally unique names.
> 
> A: And what about the case: several raw disks and bitmaps.qcow2? In this
> case using of namespaces is impossible. Or we are going to have
> *-bitmap.qcow2 for each disk..
> 

We could continue letting users do drive0 bitmap0 and drive1 bitmap0,
but as soon as they try to use the QMP commands to store those bitmaps,
the QMP command will report an error if the bitmaps.qcow2 already has a
"bitmap0."

It effectively uses a per-file namespace for bitmaps and applies that
restriction to any in-memory bitmaps created with persistence flags.

> B: As I understand, we have no id or name for the image, it comes from
> cmd line.. So we can't use node name as namespace name. Why not just add
> namespaces?
> 
> -drive file=a.raw,id=disk1,dirty-bitmaps-namespace=disk1_ns \
> -drive file=b.raw,id=disk2,dirty-bitmaps-namespace=disk2_ns \
>  -dirty-bitmap name=bitmap0,drive=disk1,file=bitmaps.qcow2
>  -dirty-bitmap name=bitmap1,drive=disk1,file=bitmaps.qcow2
>  -dirty-bitmap name=bitmap0,drive=disk2,file=bitmaps.qcow2
> 
> 
> Default namespace: empty string or node name?
> Namespace name should be stored in bitmap header for each bitmap.. As
> separate field with length field, or may be as bitmap name part
> (separated from it by '#' character for example)
> 

I think that's starting to get a little too manual and verbose on the
CLI at this point. Maybe we really should just enforce the first option
and call it a day.

>>
>>
>> (9) Data consistency
>>
>> We need to discuss the data safety element to this. I think that
>> atomically before the first write is flushed to disk, the dirty bitmap
>> needs to *at least* set a bit in the bitmap header that indicates that
>> the bitmap is no longer up-to-date.
>>
>> When the bitmap is later flushed to disk, that bit can be cleared until
>> the next write occurs, which repeats the process.
>>
>> We have discussed this (long ago) in the past, but one of the ideas was
>> to monitor the relative utilization rate of the disk and attempt to
>> flush the bitmap whenever there was a lull in disk IO, then clear the
>> "inconsistent" bit.
>>
>> On close, the flush of data and bitmap both would lead us to clear this
>> bit as well.
>>
>> Upon boot, if the inconsistent bit was set, we'd know that the bitmap
>> was outdated and we'd have to recommend that the bitmap be cleared and a
>> new bitmap started.
>>
>> (Or, perhaps, a data-intensive mode where we compare the current data
>> mode with the most recent incremental backup to re-determine what data
>> has changed. This would be very, very slow but an option at least for
>> recovery if started a new full backup is even less desirable.)
>>
>> Other ideas involve regularly flushing the bitmap at certain timed
>> intervals, certain usage intervals (e.g. when the changed bitmap data
>> reaches some total size, like 64KiB of changed bits), or a combination
>> of regular intervals with "opportunistic" flushing during Disk IO lulls.
>>
>> This is a key feature that absolutely needs to make it into the base
>> series, IMO.
> 
> I don't understand, what the use of flushing bitmap not only on
> disk:close? If there no failures with disk, than bitmap will be flushed
> on close and will be consistent for next open(). If there is a disk
> crash, even if we flush the bitmap regularly, what is the possibility of
> crashing immediately after last flush, before further io-s?
> 

The usage case is QEMU crash, power failure, etc. Not disk crash. If we
periodically flush to HD, we increase the chances that we don't corrupt
our image and bitmap.

If we NEVER flush, we guarantee that any segfault or power outage will
absolutely trash our data.

>>
>> (10) Storage Efficiency
>>
>> We should discuss the usage of meta bitmaps or ancillary bitmaps to
>> record which parts of our bitmap data need to be flushed to disk in
>> order to reduce flush/close time.
>>
>> The current meta bitmap implementation optimizes for 1KiB writes to the
>> network (which fits well under the standard 1500bytes), but perhaps we
>> could optimize for local storage block size and use this to be stingy
>> about how much data we decide to write to disk.
>>
>> I believe this is another feature that should be included in the initial
>> series as well, because it might radically impact the core design.
> 
> ok
> 

Yeah, we might just end up having meta_bitmaps on all the time and rely
on them to know what remains to be written to disk. I think Stefan
wasn't too keen on the idea of a 512GiB disk needing to write a solid
1MiB of data on every close, when in practice we might be able to reduce
it to just a handful of block writes.

>>
>> (11) Migration
>>
>> Stefan already touched on this, but we should be mindful of the
>> different kinds of migration scenarios.
>>
>> We might migrate the disks, or they might be shared already.
>>
>> We might migrate (or share) a disk, but what happens if we didn't
>> migrate or didn't share the bitmap storage file that we were using?
>>
>> Bitmaps without persistence data will migrate just fine, but how do we
>> intend to migrate the persistence data itself? I suppose as a first pass
>> we can just tap into the migration calls and migrate some properties
>> like:
>>
>> "This bitmap relies on node_id=xxxx to save its bitmap"
>>
>> and that should probably work for either kind of storage migration
>> tactic. The only problem would be nodes without IDs that we opened by
>> filename ...
> 
> It looks like some bitmaps may be migrated automatically (i.e. created
> on destination, if they don't exist), but others don't. This means, that
> user should describe bitmaps in destination cmd, at least bitmaps,
> loaded from file, not node name. And in this case, migration of
> persistent bitmap will success if there is a bitmap on destination for
> the same node, with the same name and granularity and with set 'file'
> field. Otherwise migration fails..
> 

I think I still need to think about this one for a little bit, but I
think there's other work we can do in the meantime at least.

>>
>> ...Another technique would be for any bitmap that is persistent is to
>> store them all first prior to migration and then allow the destination
>> to load them anew. This would also work for either shared or migrated
>> storage if we worked it right.
>>
>> It seems a little hairy, and I don't have the answers right now...
>> Something I will ponder on the weekend.
> 
> 

Thanks!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]