qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code


From: Blue Swirl
Subject: Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code
Date: Tue, 27 Sep 2011 16:53:16 +0000

On Tue, Sep 27, 2011 at 3:59 PM, Alexander Graf <address@hidden> wrote:
>
> On 27.09.2011, at 17:50, Blue Swirl wrote:
>
>> On Mon, Sep 26, 2011 at 11:19 PM, Scott Wood <address@hidden> wrote:
>>> On 09/24/2011 05:00 AM, Alexander Graf wrote:
>>>> On 24.09.2011, at 10:44, Blue Swirl wrote:
>>>>> On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf <address@hidden> wrote:
>>>>>> On 24.09.2011, at 09:41, Blue Swirl wrote:
>>>>>>> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood <address@hidden> wrote:
>>>>>>>> The goal with the spin table stuff, suboptimal as it is, was something
>>>>>>>> that would work on any powerpc implementation.  Other
>>>>>>>> implementation-specific release mechanisms are allowed, and are
>>>>>>>> indicated by a property in the cpu node, but only if the loader knows
>>>>>>>> that the OS supports it.
>>>>>>>>
>>>>>>>>> IIUC the spec that includes these bits is not finalized yet. It is 
>>>>>>>>> however in use on all u-boot versions for e500 that I'm aware of and 
>>>>>>>>> the method Linux uses to bring up secondary CPUs.
>>>>>>>>
>>>>>>>> It's in ePAPR 1.0, which has been out for a while now.  ePAPR 1.1 was
>>>>>>>> just released which clarifies some things such as WIMG.
>>>>>>>>
>>>>>>>>> Stuart / Scott, do you have any pointers to documentation where the 
>>>>>>>>> spinning is explained?
>>>>>>>>
>>>>>>>> https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf
>>>>>>>
>>>>>>> Chapter 5.5.2 describes the table. This is actually an interface
>>>>>>> between OS and Open Firmware, obviously there can't be a real hardware
>>>>>>> device that magically loads r3 etc.
>>>
>>> Not Open Firmware, but rather an ePAPR-compliant loader.
>>
>> 'boot program to client program interface definition'.
>>
>>>>>>> The device method would break abstraction layers,
>>>
>>> Which abstraction layers?
>>
>> QEMU system emulation emulates hardware, not software. Hardware
>> devices don't touch CPU registers.
>
> The great part about this emulated device is that it's basically guest 
> software running in host context. To the guest, it's not a device in the 
> ordinary sense, such as vmport, but rather the same as software running on 
> another core, just that the other core isn't running any software.
>
> Sure, if you consider this a device, it does break abstraction layers. Just 
> consider it as host running guest code, then it makes sense :).
>
>>
>>>>>>> it's much like
>>>>>>> vmport stuff in x86. Using a hypercall would be a small improvement.
>>>>>>> Instead it should be possible to implement a small boot ROM which puts
>>>>>>> the secondary CPUs into managed halt state without spinning, then the
>>>>>>> boot CPU could send an IPI to a halted CPU to wake them up based on
>>>>>>> the spin table, just like real HW would do.
>>>
>>> The spin table, with no IPI or halt state, is what real HW does (or
>>> rather, what software does on real HW) today.  It's ugly and inefficient
>>> but it should work everywhere.  Anything else would be dependent on a
>>> specific HW implementation.
>>
>> Yes. Hardware doesn't ever implement the spin table.
>>
>>>>>>> On Sparc32 OpenBIOS this
>>>>>>> is something like a few lines of ASM on both sides.
>>>>>>
>>>>>> That sounds pretty close to what I had implemented in v1. Back then the 
>>>>>> only comment was to do it using this method from Scott.
>>>
>>> I had some comments on the actual v1 implementation as well. :-)
>>>
>>>>>> So we have the choice between having code inside the guest that
>>>>>> spins, maybe even only checks every x ms, by programming a timer,
>>>>>> or we can try to make an event out of the memory write. V1 was
>>>>>> the former, v2 (this one) is the latter. This version performs a
>>>>>> lot better and is easier to understand.
>>>>>
>>>>> The abstraction layers should not be broken lightly, I suppose some
>>>>> performance or laziness^Wlocal optimization reasons were behind vmport
>>>>> design too. The ideal way to solve this could be to detect a spinning
>>>>> CPU and optimize that for all architectures, that could be tricky
>>>>> though (if a CPU remains in the same TB for extended periods, inspect
>>>>> the TB: if it performs a loop with a single load instruction, replace
>>>>> the load by a special wait operation for any memory stores to that
>>>>> page).
>>>
>>> How's that going to work with KVM?
>>>
>>>> In fact, the whole kernel loading way we go today is pretty much
>>>> wrong. We should rather do it similar to OpenBIOS where firmware
>>>> always loads and then pulls the kernel from QEMU using a PV
>>>> interface. At that point, we would have to implement such an
>>>> optimization as you suggest. Or implement a hypercall :).
>>>
>>> I think the current approach is more usable for most purposes.  If you
>>> start U-Boot instead of a kernel, how do pass information on from the
>>> user (kernel, rfs, etc)?  Require the user to create flash images[1]?
>>
>> No, for example OpenBIOS gets the kernel command line from fw_cfg device.
>>
>>> Maybe that's a useful mode of operation in some cases, but I don't think
>>> we should be slavishly bound to it.  Think of the current approach as
>>> something between whole-system and userspace emulation.
>>
>> This is similar to ARM, M68k and Xtensa semi-hosting mode, but not at
>> kernel level but lower. Perhaps this mode should be enabled with
>> -semihosting flag or a new flag. Then the bare metal version could be
>> run without the flag.
>
> and then we'd have 2 implementations for running in system emulation mode and 
> need to maintain both. I don't think that scales very well.

No, but such hacks are not common.

>>
>>> Where does the device tree come from?  How do you tell the guest about
>>> what devices it has, especially in virtualization scenarios with non-PCI
>>> passthrough devices, or custom qdev instantiations?
>>>
>>>> But at least we'd always be running the same guest software stack.
>>>
>>> No we wouldn't.  Any U-Boot that runs under QEMU would have to be
>>> heavily modified, unless we want to implement a ton of random device
>>> emulation, at least one extra memory translation layer (LAWs, localbus
>>> windows, CCSRBAR, and such), hacks to allow locked cache lines to
>>> operate despite a lack of backing store, etc.
>>
>> I'd say HW emulation business as usual. Now with the new memory API,
>> it should be possible to emulate the caches with line locking and TLBs
>> etc., this was not previously possible. IIRC implementing locked cache
>> lines would allow x86 to boot unmodified coreboot.
>
> So how would you emulate cache lines with line locking on KVM?

The cache would be a MMIO device which registers to handle all memory
space. Configuring the cache controller changes how the device
operates. Put this device between CPU and memory and other devices.
Performance would probably be horrible, so CPU should disable the
device automatically after some time.

> However, we already have a number of hacks in SeaBIOS to run in QEMU, so I 
> don't see an issue in adding a few here and there in u-boot. The memory 
> pressure is a real issue though. I'm not sure how we'd manage that one. Maybe 
> we could try and reuse the host u-boot binary? heh

I don't think SeaBIOS breaks layering except for fw_cfg. For extremely
memory limited situation, perhaps QEMU (or Native KVM Tool for lean
and mean version) could be run without glibc, inside kernel or even
interfacing directly with the hypervisor. I'd also continue making it
possible to disable building unused devices and features.

>>
>>> -Scott
>>>
>>> [1] Keep in mind that a major use case for e500 KVM is on host systems
>>> that don't have a hard drive.  I want to *reduce* the amount of memory
>>> we waste to store this stuff, not increase it.
>>
>> Interesting use case. Is there a display device?
>
> There are some boards with display and/or PCI(e), yes.
>
>
> Alex
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]