qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: zlib-ng as a compat replacement for zlib


From: Kevin Wolf
Subject: Re: zlib-ng as a compat replacement for zlib
Date: Fri, 1 Sep 2023 13:10:16 +0200

Am 01.09.2023 um 12:03 hat Richard W.M. Jones geschrieben:
> On Fri, Sep 01, 2023 at 10:55:50AM +0100, Daniel P. Berrangé wrote:
> > On Fri, Sep 01, 2023 at 10:42:16AM +0100, Richard W.M. Jones wrote:
> > > On Fri, Sep 01, 2023 at 10:48:14AM +0200, Kevin Wolf wrote:
> > > > I understand the context and didn't really think about file size at all.
> > > > 
> > > > My question was essentially if decompressing many small blocks (as we do
> > > > today) performs significantly different from decompressing fewer larger
> > > > blocks (as we would do with a larger cluster size).
> > > 
> > > I did a quick test just by adjusting the cluster size of a qcow2 file:
> > > 
> > >   $ virt-builder fedora-36
> > >   $ ls -lsh fedora-36.img 
> > >   1.2G -rw-r--r--. 1 rjones rjones 6.0G Sep  1 09:53 fedora-36.img
> > >   $ cat fedora-36.img fedora-36.img fedora-36.img fedora-36.img  > 
> > > test.raw
> > >   $ ls -lsh test.raw 
> > >   4.7G -rw-r--r--. 1 rjones rjones 24G Sep  1 09:53 test.raw
> > >   $ qemu-img convert -f raw test.raw -O qcow2 test.qcow2.zlib.4k -c -o 
> > > compression_type=zlib,cluster_size=4096
> > > 
> > > (for cluster sizes 4k, 64k, 512k, 2048k, and
> > > compression types zlib & zstd)
> > > 
> > > I tested the speed of decompression using:
> > > 
> > >   $ hyperfine 'qemu-img convert -W -m 16 -f qcow2 test.qcow2.XXX -O raw 
> > > test.out'
> > >   (qemu 8.0.0-4.fc39.x86_64)
> > > 
> > >   $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
> > > '\''nbdcopy --request-size "$uri" test.out'\'' '
> > >   (nbdkit-1.35.11-2.fc40.x86_64)
> > > 
> > > Results:
> > > 
> > >   Cluster  Compression  Compressed size  Prog   Decompression speed
> > > 
> > >   4k       zlib         3228811264       qemu   5.921 s ±  0.074 s
> > >   4k       zstd         3258097664       qemu   5.189 s ±  0.158 s
> > > 
> > >   4k       zlib/zstd                     nbdkit failed, bug!!
> > > 
> > >   64k      zlib         3164667904       qemu   3.579 s ±  0.094 s
> > >   64k      zstd         3132686336       qemu   1.770 s ±  0.060 s
> > > 
> > >   64k      zlib         3164667904       nbdkit 1.254 s ±  0.065 s
> > >   64k      zstd         3132686336       nbdkit 1.315 s ±  0.037 s
> > > 
> > >   512k     zlib         3158744576       qemu   4.008 s ±  0.058 s
> > >   512k     zstd         3032697344       qemu   1.503 s ±  0.072 s
> > > 
> > >   512k     zlib         3158744576       nbdkit 1.702 s ±  0.026 s
> > >   512k     zstd         3032697344       nbdkit 1.593 s ±  0.039 s
> > > 
> > >   2048k    zlib         3197569024       qemu   4.327 s ±  0.051 s
> > >   2048k    zstd         2995143168       qemu   1.465 s ±  0.085 s
> > > 
> > >   2048k    zlib         3197569024       nbdkit 3.660 s ±  0.011 s
> > >   2048k    zstd         2995143168       nbdkit 3.368 s ±  0.057 s
> > > 
> > > No great surprises - very small cluster size is inefficient, but
> > > scaling up the cluster size gain performance, and zstd performs better
> > > than zlib once the cluster size is sufficiently large.

It's interesting that for zstd, qemu-img keeps getting better with
increasing cluster size, while zlib numbers get worse again above 64k.
And nbdkit seems to get worse instead of better with larger cluster
size, no matter whether zlib or zstd is used.

> > The default qcow2 cluster size is 64k, which means we've already
> > got the vast majority of the perfornmance and file size win. Going
> > beyond 64k defaults doesn't seem massively compelling.
> > 
> > zstd does have a small space win over zlib as expected, but again
> > nothing so drastic that it seems compelling to change - that win
> > will be line noise in the overall bigger picture of image storage
> > and download times.
> 
> Yeah, I was a bit surprised by this.  I expected zstd files to be
> significantly smaller than zlib even though that's not what zstd is
> optimized for.  Not that they'd be about the same.
> 
> > The major difference here is that zstd is much faster than zlib
> > at decompress. I'd be curious if zlib-ng closes that gap ?
> 
> It's quite hard to use zlib-ng in Fedora (currently) since it requires
> changes to the source code.  That is what the pull request being
> discussed would change, as you could simply install zlib-ng-compat
> which would replace libz.so.  But anyway I can't easily get results
> for qemu + zlib-ng, but we expect it would be ~ 40% faster at
> decompression, and decompression is what is taking most of the time in
> the qemu numbers above.
> 
> I forgot to say that nbdkit is using zlib-ng, since I made the source
> level changes a few weeks back (but most of the nbdkit performance
> improvement comes from being able to use lots of threads).

Ah, that's actually a very important detail. I was wondering why zlib
was performing so much better in nbdkit when zstd is more or less
comparable in both.

If you think using more threads is the key for the remaining difference
at 64k, would increasing QCOW2_MAX_THREADS (currently only 4) help on
the qemu-img side?

> > If it does, then for the sake of image portability it'd be better
> > to stick with zlib compression in qcow2 and leverage zlib-ng for
> > speed, and ignore zstd.

At the first sight, it looks like this would already help a lot. I guess
we'd have to create a PoC patch and measure it.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]