qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE


From: Richard W.M. Jones
Subject: Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
Date: Mon, 10 Feb 2020 22:52:55 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Feb 10, 2020 at 04:29:53PM -0600, Eric Blake wrote:
> On 2/10/20 4:12 PM, Richard W.M. Jones wrote:
> >On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:
> >>For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the
> >>image has at least one hole) and NBD_INIT_ZERO (the image reads
> >>completely as zero); the two bits are orthogonal and can be set
> >>independently, although it is easy enough to see completely sparse
> >>files with both bits set.
> >
> >I think I'm confused about the exact meaning of NBD_INIT_SPARSE.  Do
> >you really mean the whole image is sparse; or (as you seem to have
> >said above) that there exists a hole somewhere in the image but we're
> >not saying where it is and there can be non-sparse parts of the image?
> 
> As implemented:
> 
> NBD_INIT_SPARSE - there is at least one hole somewhere (allocation
> would be required to write to that part of the file), but there may
> b allocated data elsewhere in the image.  Most disk images will fit
> this definition (for example, it is very common to have a hole
> between the MBR or GPT and the first partition containing a file
> system, or for file systems themselves to be sparse within the
> larger block device).

I think I'm still confused about why this particular flag would be
useful for clients (I can completely understand why clients need
NBD_INIT_ZERO).

But anyway ... could a flag indicating that the whole image is sparse
be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
could use it to avoid an initial disk trim, which is something that
mke2fs does:

  
https://github.com/tytso/e2fsprogs/blob/0670fc20df4a4bbbeb0edb30d82628ea30a80598/misc/mke2fs.c#L2768

and which is painfully slow over NBD for very large devices because of
the 32 bit limit on request sizes - try doing mke2fs on a 1E nbdkit
memory disk some time.

> NBD_INIT_ZERO - all bytes read as zero.
> 
> The combination NBD_INIT_SPARSE|NBD_INIT_ZERO is common (generally,
> if you use lseek(SEEK_DATA) to prove the entire image reads as
> zeroes, you also know the entire image is sparse), but NBD_INIT_ZERO
> in isolation is also possible (especially with the qcow2 proposal of
> a persistent autoclear bit, where even with a fully preallocated
> qcow2 image you still know it reads as zeroes but there are no
> holes).  But you are also right that for servers that can advertise
> both bits efficiently, NBD_INIT_SPARSE in isolation may be more
> common than NBD_INIT_SPARSE|NBD_INIT_ZERO (the former for most disk
> images, the latter only for a freshly-created image that happens to
> create with zero initialization).
> 
> What's more, in my patches, I did NOT patch qemu to set or consume
> INIT_SPARSE; so far, it only sets/consumes INIT_ZERO.  Of course, if
> we can find a reason WHY qemu should track whether a qcow2 image is
> fully-allocated, by demonstrating a qemu-img algorithm that becomes
> easier for knowing if an image is sparse (even if our justification
> is: "when copying an image, I want to know if the _source_ is
> sparse, to know whether I have to bend over backwards to preallocate
> the destination"), then using that in qemu makes sense for my v2
> patches. But for v1, my only justification was "when copying an
> image, I can skip holes in the source if I know the _destination_
> already reads as zeroes", which only needed INIT_ZERO.
> 
> Some of the nbdkit patches demonstrate the some-vs.-all nature of
> the two bits; for example, in the split plugin, I initialize
> h->init_sparse = false; h->init_zero = true; then in a loop over
> each file change h->init_sparse to true if at least one file was
> sparse, and change h->init_zero to false if at least one file had
> non-zero contents.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]