qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for


From: Richard Henderson
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
Date: Sat, 9 Apr 2016 15:45:43 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1

On 04/07/2016 02:58 AM, address@hidden wrote:
+#elif defined __aarch64__
+#include "arm_neon.h"

A better test is __NEON__, which asserts that neon is available at compile time (which will be true basically always for aarch64), and then you don't need a runime test for neon.

You also get support for armv7 with neon.

+#define NEON_VECTYPE               uint64x2_t
+#define NEON_LOAD_N_ORR(v1, v2)    (vld1q_u64(&v1) | vld1q_u64(&v2))
+#define NEON_ORR(v1, v2)           ((v1) | (v2))
+#define NEON_NOT_EQ_ZERO(v1) \
+        ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0))

FWIW, I think that vmaxvq_u32 would be a better reduction for aarch64. Extracting the individual lanes isn't as efficient as one would like.

For armv7, folding via vget_lane_u64(vget_high_u64(v1) | vget_low_u64(v1), 0) is probably best.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]