qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 03/17] spec: add qcow2-dirty-bitmaps specificati


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH 03/17] spec: add qcow2-dirty-bitmaps specification
Date: Fri, 9 Oct 2015 19:07:28 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 08.10.2015 22:28, John Snow wrote:

[...]

> (NB: I never got Max's original reply, so this reply is more to Max than
> to Denis or Vladimir.)

Let's hope you'll see this one, then. :-)

> I'll see your wall of text and raise you my own wall of text...
> 
> We consider the ability to use persistent bitmaps to create incremental
> backups for non-qcow2 images to be a necessary and vital component of
> complete incremental backup solution, especially considering the core
> mechanism of the feature does not really rely on qcow2 for anything
> outside of convenience (e.g. backing files.)
> 
> Vladimir's initial proposal of using .qcow2 to store the bitmap sounded
> good to me, because I wanted to be able to store the bitmap in a qcow2
> file anyway, and expanding the specification to allow it to store
> /arbitrary/ bitmaps seemed like a natural fit to accomplish both goals
> with a minimum of coding.
> 
> However, Max has raised some pretty good points here -- allow me to
> paraphrase his Wall Of Text™:
> 
> - Since this patch is a modification of the qcow2 /specification/ which
> is used by more than just QEMU, we must take care to avoid QEMU-isms
> limitations and design a more universal approach to the specification
> addendum.
> 
> - Specific caps on the number of bitmaps, the granularity of said
> bitmaps, and the resultant size of said bitmaps should be addressed in
> terms of the spec, not in terms of what's necessarily convenient or
> sufficient for QEMU. I think this point will be easy to address with
> some better spec wording.
> 
> - The bitmap language in the spec is generic and doesn't refer much to
> anything. This is partially my fault, as I believe I likely guided
> Vladimir towards using generic language that was tied more to the
> HBitmap format than towards our specific implementation
> (BdrvDirtyBitmap.) I recognize this as a bit of a misguided effort on my
> part to keep things "generic," but what I succeeded in doing was keeping
> it "useless" outside of QEMU. Example: "number of virtual bits" is
> meaningless, but "number of 512 byte sectors" is not.

And these are all things that are only a question of the implementation,
so to speak. While it may not be easy writing up the necessary bits for
the specification, I don't think there'll be much discussion on it.

Except maybe the last bit, because "512 byte sector" basically is
meaningless when talking about a qcow2 file (which works in terms of
clusters), but that's where the second part comes in:

> The last bit is the crux of our problem and the most deserving of our
> attention:
> 
> - For a bitmap to be useful to an application outside of QEMU, all of
> the necessary information for interpreting that bitmap must either be
> present within the file or referenced. For bitmaps that describe the
> file they are stored in, this is trivial with some specification editing.
> 
> For bitmaps stored for /other/ files, this gets... trickier. What is
> this a bitmap for? What does it describe? What data does it describe?
> 
> Node-names and drive names here are useless outside of QEMU and can of
> course change between QEMU invocations or be shared between different
> QEMU instances, so this is useless ...
> 
> We could store filenames, but networked devices and distributed
> filesystems may have interesting relative pathnames that will not remain
> reliable once the .qcow2 file is shuffled around or migrated, so storing
> path-name references seems like a losing battle here, too. Maybe we only
> have a file descriptor and no name at all -- what do we write for the
> "global identifier that uniquely identifies the data we belong to"? Is
> it even possible?

I'd be fine with filenames. It works reasonably well for backing files,
and it's basically the same problem there.

Anyway, even if you could describe the image the dirty bitmap is for,
I'd still oppose putting all that into qcow2. Imagine you're writing a
qcow2 interpreting tool and reading the specification, then:

“This field contains the filename of the image this dirty bitmap is for.
This field contains the filename of the clean image. This field contains
the resolution of the dirty bitmap in units of 512 bytes.”

While this may make sense from the perspective of qemu, it doesn't make
any sense from the perspective of qcow2. As said tool writer, you'd be
asking yourself: “OK, so this information is completely useless because
it says nothing about the qcow2 file itself? Actually, it doesn't even
have any connection to this file.”

It's actually not better than a binary data dump without any information
on how to interpret it, then. Because you cannot interpret it, even
though you know how to; if nothing else, that's because you're writing a
qcow2 tool and the other image is very unlikely to be a qcow2 image as well.

> The only conclusion I can reach here is that storing bitmaps inside of a
> .qcow2 that remain meaningful to external applications is not going to
> be easily possible.
> 
> Perhaps we need to abandon the idea that we can store any bitmap we want
> into a .qcow2.
> 
> However, I'm still a big fan of storing bitmaps that describe the data
> they go alongside in the same .qcow2 as a convenience feature --

I'd be fine with that.

> especially now since Vladimir has done the hard work for us all writing
> the feature.

It's just that if we need a new format for all the other image formats
anyway, from the effort side of things, having a special implementation
for qcow2 won't make the implementation any easier, even if we already
have it.

> For simple use cases in non-managed environments the use case for
> storing the bitmap inside the qcow2 it describes is pretty compelling:
> 
> - No extra files to track or manage
> - The command line used to boot QEMU the first time can be used to boot
> QEMU subsequently, and we get the persistent bitmap automatically
> without further modification.
> - Migration across a shared medium using .qcow2 files is trivial
> - Backups managed by qcow2-unaware applications trivially bring along
> our persistent bitmap data for us without additional configuration.

OK, those points look good enough to justify making qcow2 a special case.

Even though I'm still not really convinced in regards to the command
line, because I still think it's a management tool level feature.

> It's simple, the data is meaningful to external applications, and we've
> got most of the code we need already, thanks to Vladimir.
> 
> Sadly, we still need a way to store bitmap data for files that do not
> offer .bitmap_load and .bitmap_store primitives for us.
> 
> Presumably, if we devise our own "generic bitmap container" format, we
> don't have to store things like node names, filenames, etc in this
> container and we can use it to just store (name, granularity, size,
> [data]) like we were trying to do in qcow2.
> 
> Matching bitmap IDs up to the data they belong to becomes the
> responsibility of the user/management layer.

Yep.

> Where this gets hairy, perhaps, is how to enforce that the drive data
> that belongs to this bitmap isn't modified without our say-so? How do we
> detect de-sync? The quick answer might be to store a hash alongside the
> bitmap, and upon being re-applied to the drive if the hash doesn't
> match, we throw an error/complain/etc -- but what about cases -- again
> -- where we don't necessarily have a file we can trivially hash, like a
> many-gigs-wide raw file being mounted over a networked file-system?
> "external bitmaps" appear to pose a very real desync risk.
> 
> I suppose we would have had that problem anyway with the
> qcow2-as-container idea.

Exactly. But I'd say we can worry about that later. If we have something
like a JSON description file inside of the container, we can always add
timestamps or hashes later on.

> Perhaps the best we can say is "This is kind of a dangerous feature to
> use, use at your own peril!" and strongly recommend that external
> bitmaps are used only in conjunction with a management interface.

Yes, that's what I'd do.

> That's about all of the thoughts I have on the matter currently.
> Does anybody else have strong feelings on where we should go from here?
> 
> (A) Argue with Max and push for qcow2-as-container

:-)

> (B) Use qcow2 for self-reference bitmaps only, use an external format
> for formats that do not support .bitmap_load or .bitmap_store
> (C) Forget about the qcow2 extension entirely, use only the new external
> format
> (D) Something else?
> 
> My vote is for (B),

Sounds good to me.

>                     and if I can find a bit of consensus on that, we can
> draft an internal-use specification for the file, but I am very wary

“very wary”, yes, I noticed. Because I am very wary of word combinations
like that.

>                                                                      of
> how we will manage de-sync or if we will be able manage it at all.
> ("Your fault for touching this file when QEMU was not running.")

Well, yes, the problem is a different than the "Your fault for touching
a qcow2 file when qemu was using it", but I mean, what would the use
case be?

If you're not using a management tool, well, then it *is* your fault.

If you are using a management tool, then that means you are using
incremental backups and from time to time you are writing data to the
image from outside qemu and expect it to be caught by your management
tool automatically. Sounds like "your fault" to me, too.

> Thoughts?

All above. :-)

> --js

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]