qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero


From: Peter Maydell
Subject: Re: [Qemu-arm] [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking
Date: Sat, 2 Jul 2016 10:42:52 +0100

On 1 July 2016 at 23:07, Richard Henderson <address@hidden> wrote:
> On 06/30/2016 06:45 AM, Peter Maydell wrote:
>>
>> On 29 June 2016 at 09:47,  <address@hidden> wrote:
>>>
>>> From: Vijay <address@hidden>
>>>
>>> Use Neon instructions to perform zero checking of
>>> buffer. This is helps in reducing total migration time.
>>
>>
>>> diff --git a/util/cutils.c b/util/cutils.c
>>> index 5830a68..4779403 100644
>>> --- a/util/cutils.c
>>> +++ b/util/cutils.c
>>> @@ -184,6 +184,13 @@ int qemu_fdatasync(int fd)
>>>  #define SPLAT(p)       _mm_set1_epi8(*(p))
>>>  #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) ==
>>> 0xFFFF)
>>>  #define VEC_OR(v1, v2) (_mm_or_si128(v1, v2))
>>> +#elif __aarch64__
>>> +#include "arm_neon.h"
>>> +#define VECTYPE        uint64x2_t
>>> +#define ALL_EQ(v1, v2) \
>>> +        ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
>>> +         (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
>>> +#define VEC_OR(v1, v2) ((v1) | (v2))
>>
>>
>> Should be '#elif defined(__aarch64__)'. I have made this
>> tweak and put this patch in target-arm.next.
>
>
> Consider
>
> #define VECTYPE        uint32x4_t
> #define ALL_EQ(v1, v2) (vmaxvq_u32((v1) ^ (v2)) == 0)

Sounds good. Vijay, could you benchmark that variant, please?

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]