[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero |
Date: |
Mon, 5 Sep 2016 16:08:06 +0100 |
User-agent: |
Mutt/1.7.0 (2016-08-17) |
* Richard Henderson (address@hidden) wrote:
Have you considered contributing something similar to this to glibc?
I filed https://sourceware.org/bugzilla/show_bug.cgi?id=19920 a while back
suggesting it would be useful to have it in libc to be used
by things other than just qemu.
Dave
> Changes from v2 to v3:
>
> * Unit testing. This includes having x86 attempt all versions of
> the accelerator that will run on the hardware. Thus an avx2 host
> will run the basic test 5 times (1.5sec on my laptop).
>
> * Drop the ppc and aarch64 specializations. I have improved the
> basic integer version to the point that those vectorized versions
> are not a win.
>
> In the case of my aarch64 mustang, the integer version is 4 times
> faster than the neon version that I delete. With effort I was
> able to rewrite the neon version to come to within a factor of 1.1,
> but it remained slower than the integer. To be fair, gcc6 makes
> very good use of ldp, so the integer path is *also* loading 16 bytes
> per insn.
>
> I can forward my standalone aarch64 benchmark if anyone is interested.
>
> Note however that at least the avx2 acceleration is still very much
> a win, being about 3 times faster on my laptop. Of course, it's
> handling 4 times as much data per loop as the integer version, so
> one can still see the overhead caused by using vector insns.
>
> For grins I wrote an avx512 version, if someone has a skylake upon
> which to test and benchmark. That requires additional configure
> checks, so I didn't bother to include it here.
>
>
> r~
>
>
> Richard Henderson (9):
> cutils: Move buffer_is_zero and subroutines to a new file
> cutils: Remove SPLAT macro
> cutils: Export only buffer_is_zero
> cutils: Rearrange buffer_is_zero acceleration
> cutils: Add test for buffer_is_zero
> cutils: Add generic prefetch
> cutils: Rewrite x86 buffer zero checking
> cutils: Remove aarch64 buffer zero checking
> cutils: Remove ppc buffer zero checking
>
> configure | 21 +--
> include/qemu/cutils.h | 3 +-
> migration/ram.c | 2 +-
> migration/rdma.c | 5 +-
> tests/Makefile.include | 3 +
> tests/test-bufferiszero.c | 78 +++++++++++
> util/Makefile.objs | 1 +
> util/bufferiszero.c | 332
> ++++++++++++++++++++++++++++++++++++++++++++++
> util/cutils.c | 244 ----------------------------------
> 9 files changed, 423 insertions(+), 266 deletions(-)
> create mode 100644 tests/test-bufferiszero.c
> create mode 100644 util/bufferiszero.c
>
> --
> 2.7.4
>
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero,
Dr. David Alan Gilbert <=