qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code


From: Blue Swirl
Subject: Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code
Date: Tue, 27 Sep 2011 15:50:05 +0000

On Mon, Sep 26, 2011 at 11:19 PM, Scott Wood <address@hidden> wrote:
> On 09/24/2011 05:00 AM, Alexander Graf wrote:
>> On 24.09.2011, at 10:44, Blue Swirl wrote:
>>> On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf <address@hidden> wrote:
>>>> On 24.09.2011, at 09:41, Blue Swirl wrote:
>>>>> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood <address@hidden> wrote:
>>>>>> The goal with the spin table stuff, suboptimal as it is, was something
>>>>>> that would work on any powerpc implementation.  Other
>>>>>> implementation-specific release mechanisms are allowed, and are
>>>>>> indicated by a property in the cpu node, but only if the loader knows
>>>>>> that the OS supports it.
>>>>>>
>>>>>>> IIUC the spec that includes these bits is not finalized yet. It is 
>>>>>>> however in use on all u-boot versions for e500 that I'm aware of and 
>>>>>>> the method Linux uses to bring up secondary CPUs.
>>>>>>
>>>>>> It's in ePAPR 1.0, which has been out for a while now.  ePAPR 1.1 was
>>>>>> just released which clarifies some things such as WIMG.
>>>>>>
>>>>>>> Stuart / Scott, do you have any pointers to documentation where the 
>>>>>>> spinning is explained?
>>>>>>
>>>>>> https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf
>>>>>
>>>>> Chapter 5.5.2 describes the table. This is actually an interface
>>>>> between OS and Open Firmware, obviously there can't be a real hardware
>>>>> device that magically loads r3 etc.
>
> Not Open Firmware, but rather an ePAPR-compliant loader.

'boot program to client program interface definition'.

>>>>> The device method would break abstraction layers,
>
> Which abstraction layers?

QEMU system emulation emulates hardware, not software. Hardware
devices don't touch CPU registers.

>>>>> it's much like
>>>>> vmport stuff in x86. Using a hypercall would be a small improvement.
>>>>> Instead it should be possible to implement a small boot ROM which puts
>>>>> the secondary CPUs into managed halt state without spinning, then the
>>>>> boot CPU could send an IPI to a halted CPU to wake them up based on
>>>>> the spin table, just like real HW would do.
>
> The spin table, with no IPI or halt state, is what real HW does (or
> rather, what software does on real HW) today.  It's ugly and inefficient
> but it should work everywhere.  Anything else would be dependent on a
> specific HW implementation.

Yes. Hardware doesn't ever implement the spin table.

>>>>> On Sparc32 OpenBIOS this
>>>>> is something like a few lines of ASM on both sides.
>>>>
>>>> That sounds pretty close to what I had implemented in v1. Back then the 
>>>> only comment was to do it using this method from Scott.
>
> I had some comments on the actual v1 implementation as well. :-)
>
>>>> So we have the choice between having code inside the guest that
>>>> spins, maybe even only checks every x ms, by programming a timer,
>>>> or we can try to make an event out of the memory write. V1 was
>>>> the former, v2 (this one) is the latter. This version performs a
>>>> lot better and is easier to understand.
>>>
>>> The abstraction layers should not be broken lightly, I suppose some
>>> performance or laziness^Wlocal optimization reasons were behind vmport
>>> design too. The ideal way to solve this could be to detect a spinning
>>> CPU and optimize that for all architectures, that could be tricky
>>> though (if a CPU remains in the same TB for extended periods, inspect
>>> the TB: if it performs a loop with a single load instruction, replace
>>> the load by a special wait operation for any memory stores to that
>>> page).
>
> How's that going to work with KVM?
>
>> In fact, the whole kernel loading way we go today is pretty much
>> wrong. We should rather do it similar to OpenBIOS where firmware
>> always loads and then pulls the kernel from QEMU using a PV
>> interface. At that point, we would have to implement such an
>> optimization as you suggest. Or implement a hypercall :).
>
> I think the current approach is more usable for most purposes.  If you
> start U-Boot instead of a kernel, how do pass information on from the
> user (kernel, rfs, etc)?  Require the user to create flash images[1]?

No, for example OpenBIOS gets the kernel command line from fw_cfg device.

> Maybe that's a useful mode of operation in some cases, but I don't think
> we should be slavishly bound to it.  Think of the current approach as
> something between whole-system and userspace emulation.

This is similar to ARM, M68k and Xtensa semi-hosting mode, but not at
kernel level but lower. Perhaps this mode should be enabled with
-semihosting flag or a new flag. Then the bare metal version could be
run without the flag.

> Where does the device tree come from?  How do you tell the guest about
> what devices it has, especially in virtualization scenarios with non-PCI
> passthrough devices, or custom qdev instantiations?
>
>> But at least we'd always be running the same guest software stack.
>
> No we wouldn't.  Any U-Boot that runs under QEMU would have to be
> heavily modified, unless we want to implement a ton of random device
> emulation, at least one extra memory translation layer (LAWs, localbus
> windows, CCSRBAR, and such), hacks to allow locked cache lines to
> operate despite a lack of backing store, etc.

I'd say HW emulation business as usual. Now with the new memory API,
it should be possible to emulate the caches with line locking and TLBs
etc., this was not previously possible. IIRC implementing locked cache
lines would allow x86 to boot unmodified coreboot.

> -Scott
>
> [1] Keep in mind that a major use case for e500 KVM is on host systems
> that don't have a hard drive.  I want to *reduce* the amount of memory
> we waste to store this stuff, not increase it.

Interesting use case. Is there a display device?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]