[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs
From: |
Alexander Graf |
Subject: |
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs |
Date: |
Wed, 22 Jan 2014 11:52:20 +0100 |
On 22.01.2014, at 08:26, Victor Kamensky <address@hidden> wrote:
> On 21 January 2014 22:41, Alexander Graf <address@hidden> wrote:
>>
>>
>> "Native endian" really is just a shortcut for "target endian"
>> which is LE for ARM and BE for PPC. There shouldn't be
>> a qemu-system-armeb or qemu-system-ppc64le.
>
> I disagree. Fully functional ARM BE system is what we've
> been working on for last few months. 'We' is Linaro
> Networking Group, Endian subteam and some other guys
> in ARM and across community. Why we do that is a bit
> beyond of this discussion.
>
> ARM BE patches for both V7 and V8 are already in mainline
> kernel. But ARM BE KVM host is broken now. It is known
> deficiency that I am trying to fix. Please look at [1]. Patches
> for V7 BE KVM were proposed and currently under active
> discussion. Currently I work on ARM V8 BE KVM changes.
>
> So "native endian" in ARM is value of CPSR register E bit.
> If it is off native endian is LE, if it is on it is BE.
>
> Once and if we agree on ARM BE KVM host changes, the
> next step would be patches in qemu one of which introduces
> qemu-system-armeb. Please see [2].
I think we're facing an ideology conflict here. Yes, there should be a
qemu-system-arm that is BE capable. There should also be a qemu-system-ppc64
that is LE capable. But there is no point in changing the "default endiannes"
for the virtual CPUs that we plug in there. Both CPUs are perfectly capable of
running in LE or BE mode, the question is just what we declare the "default".
Think about the PPC bootstrap. We start off with a BE firmware, then boot into
the Linux kernel which calls a hypercall to set the LE bit on every interrupt.
But there's no reason this little endian kernel couldn't theoretically have big
endian user space running with access to emulated device registers.
As Peter already pointed out, the actual breakage behind this is that we have a
"default endianness" at all. But that's a very difficult thing to resolve and I
don't think should be our primary goal. Just live with the fact that we declare
ARM little endian in QEMU and swap things accordingly - then everyone's happy.
This really only ever becomes a problem if you have devices that have awareness
of the CPUs endian mode. The only one on PPC that I'm aware of that falls into
this category is virtio and there are patches pending to solve that. I don't
know if there are any QEMU emulated devices outside of virtio with this issue
on ARM, but you'll have to make the emulation code for those look at the CPU
state then.
>
>> QEMU emulates everything that comes after the CPU, so
>> imagine the ioctl struct as a bus package. Your bus
>> doesn't care what endianness the CPU is in - it just
>> gets data from the CPU.
>
> I am not sure that I follow above. Suppose I have
>
> move r1, #1
> str r1, [r0]
>
> where r0 is device address. Now depending on CPSR
> E bit value device address will receive 1 as integer either
> in LE order or in BE order. That is how ARM v7 CPU
> works, regardless whether it is emulated or not.
>
> So if E bit is off (LE case) after str is executed
> byte at r0 address will get 1
> byte at r0 + 1 address will get 0
> byte at r0 + 2 address will get 0
> byte at r0 + 3 address will get 0
>
> If E bit is on (BE case) after str is executed
> byte at r0 address will get 0
> byte at r0 + 1 address will get 0
> byte at r0 + 2 address will get 0
> byte at r0 + 3 address will get 1
>
> my point that mmio.data[] just carries bytes for phys_addr
> mmio.data[0] would be value for byte at phys_addr,
> mmio.data[1] would be value for byte at phys_addr + 1, and
> so on.
What we get is an instruction that traps because it wants to "write r1 (which
has value=1) into address x". So at that point we get the register value.
Then we need to take a look at the E bit to see whether the write was supposed
to be in non-host endianness because we need to emulate exactly the LE/BE
difference you're indicating above. The way we implement this on PPC is that we
simply byte swap the register value when guest_endian != host_endian.
With this in place, QEMU can just memcpy() the value into a local register and
feed it into its emulation code which expects a "register value as if the CPU
was running in native endianness" as parameter - with "native" meaning "little
endian" for qemu-system-arm. Device emulation code doesn't know what to do with
a byte array.
Take a look at QEMU's MMIO handler:
case KVM_EXIT_MMIO:
DPRINTF("handle_mmio\n");
cpu_physical_memory_rw(run->mmio.phys_addr,
run->mmio.data,
run->mmio.len,
run->mmio.is_write);
ret = 0;
break;
which translates to
switch (l) {
case 8:
/* 64 bit write access */
val = ldq_p(buf);
error |= io_mem_write(mr, addr1, val, 8);
break;
case 4:
/* 32 bit write access */
val = ldl_p(buf);
error |= io_mem_write(mr, addr1, val, 4);
break;
case 2:
/* 16 bit write access */
val = lduw_p(buf);
error |= io_mem_write(mr, addr1, val, 2);
break;
case 1:
/* 8 bit write access */
val = ldub_p(buf);
error |= io_mem_write(mr, addr1, val, 1);
break;
default:
abort();
}
which calls the ldx_p primitives
#if defined(TARGET_WORDS_BIGENDIAN)
#define lduw_p(p) lduw_be_p(p)
#define ldsw_p(p) ldsw_be_p(p)
#define ldl_p(p) ldl_be_p(p)
#define ldq_p(p) ldq_be_p(p)
#define ldfl_p(p) ldfl_be_p(p)
#define ldfq_p(p) ldfq_be_p(p)
#define stw_p(p, v) stw_be_p(p, v)
#define stl_p(p, v) stl_be_p(p, v)
#define stq_p(p, v) stq_be_p(p, v)
#define stfl_p(p, v) stfl_be_p(p, v)
#define stfq_p(p, v) stfq_be_p(p, v)
#else
#define lduw_p(p) lduw_le_p(p)
#define ldsw_p(p) ldsw_le_p(p)
#define ldl_p(p) ldl_le_p(p)
#define ldq_p(p) ldq_le_p(p)
#define ldfl_p(p) ldfl_le_p(p)
#define ldfq_p(p) ldfq_le_p(p)
#define stw_p(p, v) stw_le_p(p, v)
#define stl_p(p, v) stl_le_p(p, v)
#define stq_p(p, v) stq_le_p(p, v)
#define stfl_p(p, v) stfl_le_p(p, v)
#define stfq_p(p, v) stfq_le_p(p, v)
#endif
and then passes the result as "originating register access" to the device
emulation part of QEMU.
Maybe it becomes more clear if you understand the code flow that TCG is going
through. With TCG whenever a write traps into MMIO we go through these functions
void
glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
DATA_TYPE val, int mmu_idx)
{
helper_te_st_name(env, addr, val, mmu_idx, GETRA());
}
#ifdef TARGET_WORDS_BIGENDIAN
# define TGT_BE(X) (X)
# define TGT_LE(X) BSWAP(X)
#else
# define TGT_BE(X) BSWAP(X)
# define TGT_LE(X) (X)
#endif
void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
int mmu_idx, uintptr_t retaddr)
{
[...]
/* Handle an IO access. */
if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
hwaddr ioaddr;
if ((addr & (DATA_SIZE - 1)) != 0) {
goto do_unaligned_access;
}
ioaddr = env->iotlb[mmu_idx][index];
/* ??? Note that the io helpers always read data in the target
byte ordering. We should push the LE/BE request down into io. */
val = TGT_LE(val);
glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
return;
}
[...]
}
static inline void glue(io_write, SUFFIX)(CPUArchState *env,
hwaddr physaddr,
DATA_TYPE val,
target_ulong addr,
uintptr_t retaddr)
{
MemoryRegion *mr = iotlb_to_region(physaddr);
physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
if (mr != &io_mem_rom && mr != &io_mem_notdirty && !can_do_io(env)) {
cpu_io_recompile(env, retaddr);
}
env->mem_io_vaddr = addr;
env->mem_io_pc = retaddr;
io_mem_write(mr, physaddr, val, 1 << SHIFT);
}
which at the end of the chain means if you're running an same endianness on
guest and host, you get the original register value as function parameter. If
you run different endianness you get a swapped value as function parameter.
So at the end of all of this, if you're running qemu-system-arm (TCG) on a BE
host the request into the io callback function will come in as register, then
stay all the way it is until it reaches the IO callback function. Unless you
define a specific endianness for your device in which case the callback may
swizzle it again. But if your device defines DEVICE_LITTLE_ENDIAN or
DEVICE_NATIVE_ENDIAN, it won't swizzle it.
What happens when you switch your guest to BE mode (or LE for PPC)? Very
simple. The TCG frontend swizzles every memory read and write before it hits
TCG's memory operations.
If you're running qemu-system-arm (KVM) on a BE host the request will come into
kvm-all.c, get read with swapped endianness (ldq_p) and then passed into that
way into the IO callback function. That's where the bug lies. It should behave
the same way as TCG, so it needs to know the value the register originally had.
So instead of doing an ldq_p() it should go through a different path that does
memcpy().
But that doesn't fix the other-endian issue yet, right? Every value now would
come in as the register value.
Well, unless you do the same thing TCG does inside the kernel. So the kernel
would swap the reads and writes before it accesses the ioctl struct that
connects kvm with QEMU. Then all abstraction layers work just fine again and we
don't need any qemu-system-armeb.
Alex
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, (continued)
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Alexander Graf, 2014/01/20
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Christoffer Dall, 2014/01/20
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Victor Kamensky, 2014/01/22
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Anup Patel, 2014/01/22
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Alexander Graf, 2014/01/22
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Victor Kamensky, 2014/01/22
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs,
Alexander Graf <=
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Victor Kamensky, 2014/01/22
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Alexander Graf, 2014/01/23
- Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Greg Kurz, 2014/01/23
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Anup Patel, 2014/01/22
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Christoffer Dall, 2014/01/23
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Peter Maydell, 2014/01/22
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Victor Kamensky, 2014/01/22
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Peter Maydell, 2014/01/22
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Victor Kamensky, 2014/01/22
Re: [Qemu-ppc] KVM and variable-endianness guest CPUs, Peter Maydell, 2014/01/22