Re: [Qemu-devel] [PATCH v3 0/9] Improve buffer_is

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero

From:	Paolo Bonzini
Subject:	Re: [Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero
Date:	Tue, 30 Aug 2016 13:48:48 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0


On 29/08/2016 20:46, Richard Henderson wrote:
> Changes from v2 to v3:
> 
>   * Unit testing.  This includes having x86 attempt all versions of
>     the accelerator that will run on the hardware.  Thus an avx2 host
>     will run the basic test 5 times (1.5sec on my laptop).
> 
>   * Drop the ppc and aarch64 specializations.  I have improved the
>     basic integer version to the point that those vectorized versions
>     are not a win.
> 
>     In the case of my aarch64 mustang, the integer version is 4 times
>     faster than the neon version that I delete.  With effort I was
>     able to rewrite the neon version to come to within a factor of 1.1,
>     but it remained slower than the integer.  To be fair, gcc6 makes
>     very good use of ldp, so the integer path is *also* loading 16 bytes
>     per insn.
> 
>     I can forward my standalone aarch64 benchmark if anyone is interested.
> 
>     Note however that at least the avx2 acceleration is still very much
>     a win, being about 3 times faster on my laptop.  Of course, it's
>     handling 4 times as much data per loop as the integer version, so
>     one can still see the overhead caused by using vector insns.
> 
>     For grins I wrote an avx512 version, if someone has a skylake upon
>     which to test and benchmark.  That requires additional configure
>     checks, so I didn't bother to include it here.

Thanks, queued for 2.8.

Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero, Richard Henderson, 2016/08/29
- [Qemu-devel] [PATCH v3 3/9] cutils: Export only buffer_is_zero, Richard Henderson, 2016/08/29
- [Qemu-devel] [PATCH v3 1/9] cutils: Move buffer_is_zero and subroutines to a new file, Richard Henderson, 2016/08/29
- [Qemu-devel] [PATCH v3 4/9] cutils: Rearrange buffer_is_zero acceleration, Richard Henderson, 2016/08/29
- [Qemu-devel] [PATCH v3 9/9] cutils: Remove ppc buffer zero checking, Richard Henderson, 2016/08/29
- [Qemu-devel] [PATCH v3 8/9] cutils: Remove aarch64 buffer zero checking, Richard Henderson, 2016/08/29
- [Qemu-devel] [PATCH v3 6/9] cutils: Add generic prefetch, Richard Henderson, 2016/08/29
- [Qemu-devel] [PATCH v3 5/9] cutils: Add test for buffer_is_zero, Richard Henderson, 2016/08/29
- [Qemu-devel] [PATCH v3 7/9] cutils: Rewrite x86 buffer zero checking, Richard Henderson, 2016/08/29
- Re: [Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero, Paolo Bonzini <=

Prev by Date: Re: [Qemu-devel] qom and debug
Next by Date: Re: [Qemu-devel] ?==?utf-8?q? ?==?utf-8?q? [PATCH 3/3]?==?utf-8?q? linux-user: Fix structure target_semid64_ds definition for Mips
Previous by thread: [Qemu-devel] [PATCH v3 7/9] cutils: Rewrite x86 buffer zero checking
Next by thread: [Qemu-devel] ARM Cortex-M issues
Index(es):
- Date
- Thread