[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 3/3] move unaligned memory access functions t
From: |
malc |
Subject: |
Re: [Qemu-devel] [PATCH v2 3/3] move unaligned memory access functions to bswap.h |
Date: |
Tue, 7 Jun 2011 03:07:06 +0400 (MSD) |
User-agent: |
Alpine 2.00 (LNX 1167 2008-08-23) |
On Mon, 6 Jun 2011, Richard Henderson wrote:
> Patches 1-3:
> Reviewed-by: Richard Henderson <address@hidden>
>
> That said,
>
> On 06/06/2011 09:25 AM, Paolo Bonzini wrote:
> > +/* conservative code for little endian unaligned accesses */
> > +static inline int lduw_le_p(const void *ptr)
> > +{
> > +#ifdef _ARCH_PPC
> > + int val;
> > + __asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr));
> > + return val;
> > +#else
> > + const uint8_t *p = ptr;
> > + return p[0] | (p[1] << 8);
> > +#endif
> > +}
>
> Can we add a patch 4/3 that removes this sort of hard-coded
> assembly stuff in favour of generic gcc code. E.g.
>
> static inline int lduw_le_p(const void *ptr)
> {
> struct pack { uint16_t x __attribute__((packed)); };
> const struct pack *p = ptr;
> uint16_t ret;
>
> ret = p->x;
> ret = le_bswap(ret, 16);
>
> return ret;
> }
>
> One could fairly well macroize all 6 instances.
>
> The compiler knows that ppc and i386 can do the unaligned loads.
> The compiler *should* be able to match the bswap_N patterns, and
> thus also fold the unaligned load with the bswap for ppc.
>
> I can confirm that gcc head does in fact do the right thing for
> ppc32/64 here, as well as for i386/x86_64. I don't have previous
> versions of gcc checked out atm...
Depends on how bswap_16 is defined. If it is __builtin_bswap16
then 4.5.0 and 4.6.0 generate byte reversed loads, and previous
versions lack that builtin, so i don't think this generic code
should go in.
>
> I havn't checked, but this *ought* to enable the load-asi bswap
> instruction for sparcv9. I also havn't checked what happens for
> a target like sparcv8 that lacks both unaligned load and bswap,
> to see that we don't simply double the number of shifts. However,
> I'd be tempted to file that as a gcc missed optimization bug.
>
>
> r~
>
--
mailto:address@hidden