qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: zlib-ng as a compat replacement for zlib


From: Daniel P . Berrangé
Subject: Re: zlib-ng as a compat replacement for zlib
Date: Fri, 1 Sep 2023 10:55:50 +0100
User-agent: Mutt/2.2.9 (2022-11-12)

On Fri, Sep 01, 2023 at 10:42:16AM +0100, Richard W.M. Jones wrote:
> On Fri, Sep 01, 2023 at 10:48:14AM +0200, Kevin Wolf wrote:
> > Am 31.08.2023 um 11:20 hat Richard W.M. Jones geschrieben:
> > > On Thu, Aug 31, 2023 at 11:05:55AM +0200, Kevin Wolf wrote:
> > > > [ Cc: qemu-block ]
> > > > 
> > > > Am 30.08.2023 um 20:26 hat Richard W.M. Jones geschrieben:
> > > > > On Tue, Aug 29, 2023 at 05:49:24PM -0000, Daniel Alley wrote:
> > > > > > > The background to this is I've spent far too long trying to 
> > > > > > > optimize
> > > > > > > the conversion of qcow2 files to raw files.  Most existing qcow2 
> > > > > > > files
> > > > > > > that you can find online are zlib compressed, including the qcow2
> > > > > > > images provided by Fedora.  Each cluster in the file is separately
> > > > > > > compressed as a zlib stream, and qemu uses zlib library functions 
> > > > > > > to
> > > > > > > decompress them.  When downloading and decompressing these files, 
> > > > > > > I
> > > > > > > measured 40%+ of the total CPU time is doing zlib decompression.
> > > > > > > 
> > > > > > > [You don't need to tell me how great Zstd is, qcow2 supports this 
> > > > > > > for
> > > > > > > compression also, but it is not widely used by existing content.]
> > > > 
> > > > You make it sound like compressing each cluster individually has a big
> > > > impact. If so, does increasing the cluster size make a difference, too?
> > > > That could be an change with less compatibility concerns.
> > > 
> > > The issue we're discussing in the original thread is speed of
> > > decompression.  We noted that using zlib-ng (a not-quite drop-in
> > > replacement for zlib) improves decompression speed by 40% or more.
> > > 
> > > Original thread:
> > > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/CDNPJ4SOTRQMYVCDI3ZSY4SP4FYESCWD/
> > > zlib-ng proposed change:
> > > https://src.fedoraproject.org/rpms/zlib-ng/pull-request/3
> > > 
> > > Size of the compressed file is also a concern, but wasn't discussed.
> > 
> > I understand the context and didn't really think about file size at all.
> > 
> > My question was essentially if decompressing many small blocks (as we do
> > today) performs significantly different from decompressing fewer larger
> > blocks (as we would do with a larger cluster size).
> 
> I did a quick test just by adjusting the cluster size of a qcow2 file:
> 
>   $ virt-builder fedora-36
>   $ ls -lsh fedora-36.img 
>   1.2G -rw-r--r--. 1 rjones rjones 6.0G Sep  1 09:53 fedora-36.img
>   $ cat fedora-36.img fedora-36.img fedora-36.img fedora-36.img  > test.raw
>   $ ls -lsh test.raw 
>   4.7G -rw-r--r--. 1 rjones rjones 24G Sep  1 09:53 test.raw
>   $ qemu-img convert -f raw test.raw -O qcow2 test.qcow2.zlib.4k -c -o 
> compression_type=zlib,cluster_size=4096
> 
> (for cluster sizes 4k, 64k, 512k, 2048k, and
> compression types zlib & zstd)
> 
> I tested the speed of decompression using:
> 
>   $ hyperfine 'qemu-img convert -W -m 16 -f qcow2 test.qcow2.XXX -O raw 
> test.out'
>   (qemu 8.0.0-4.fc39.x86_64)
> 
>   $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
> '\''nbdcopy --request-size "$uri" test.out'\'' '
>   (nbdkit-1.35.11-2.fc40.x86_64)
> 
> Results:
> 
>   Cluster  Compression  Compressed size  Prog   Decompression speed
> 
>   4k       zlib         3228811264       qemu   5.921 s ±  0.074 s
>   4k       zstd         3258097664       qemu   5.189 s ±  0.158 s
> 
>   4k       zlib/zstd                     nbdkit failed, bug!!
> 
>   64k      zlib         3164667904       qemu   3.579 s ±  0.094 s
>   64k      zstd         3132686336       qemu   1.770 s ±  0.060 s
> 
>   64k      zlib         3164667904       nbdkit 1.254 s ±  0.065 s
>   64k      zstd         3132686336       nbdkit 1.315 s ±  0.037 s
> 
>   512k     zlib         3158744576       qemu   4.008 s ±  0.058 s
>   512k     zstd         3032697344       qemu   1.503 s ±  0.072 s
> 
>   512k     zlib         3158744576       nbdkit 1.702 s ±  0.026 s
>   512k     zstd         3032697344       nbdkit 1.593 s ±  0.039 s
> 
>   2048k    zlib         3197569024       qemu   4.327 s ±  0.051 s
>   2048k    zstd         2995143168       qemu   1.465 s ±  0.085 s
> 
>   2048k    zlib         3197569024       nbdkit 3.660 s ±  0.011 s
>   2048k    zstd         2995143168       nbdkit 3.368 s ±  0.057 s
> 
> No great surprises - very small cluster size is inefficient, but
> scaling up the cluster size gain performance, and zstd performs better
> than zlib once the cluster size is sufficiently large.

The default qcow2 cluster size is 64k, which means we've already
got the vast majority of the perfornmance and file size win. Going
beyond 64k defaults doesn't seem massively compelling.

zstd does have a small space win over zlib as expected, but again
nothing so drastic that it seems compelling to change - that win
will be line noise in the overall bigger picture of image storage
and download times.

The major difference here is that zstd is much faster than zlib
at decompress. I'd be curious if zlib-ng closes that gap ?

If it does, then for the sake of image portability it'd be better
to stick with zlib compression in qcow2 and leverage zlib-ng for
speed, and ignore zstd.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]