qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code


From: Scott Wood
Subject: Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code
Date: Mon, 26 Sep 2011 18:19:31 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc15 Thunderbird/3.1.10

On 09/24/2011 05:00 AM, Alexander Graf wrote:
> On 24.09.2011, at 10:44, Blue Swirl wrote:
>> On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf <address@hidden> wrote:
>>> On 24.09.2011, at 09:41, Blue Swirl wrote:
>>>> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood <address@hidden> wrote:
>>>>> The goal with the spin table stuff, suboptimal as it is, was something
>>>>> that would work on any powerpc implementation.  Other
>>>>> implementation-specific release mechanisms are allowed, and are
>>>>> indicated by a property in the cpu node, but only if the loader knows
>>>>> that the OS supports it.
>>>>>
>>>>>> IIUC the spec that includes these bits is not finalized yet. It is 
>>>>>> however in use on all u-boot versions for e500 that I'm aware of and the 
>>>>>> method Linux uses to bring up secondary CPUs.
>>>>>
>>>>> It's in ePAPR 1.0, which has been out for a while now.  ePAPR 1.1 was
>>>>> just released which clarifies some things such as WIMG.
>>>>>
>>>>>> Stuart / Scott, do you have any pointers to documentation where the 
>>>>>> spinning is explained?
>>>>>
>>>>> https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf
>>>>
>>>> Chapter 5.5.2 describes the table. This is actually an interface
>>>> between OS and Open Firmware, obviously there can't be a real hardware
>>>> device that magically loads r3 etc.

Not Open Firmware, but rather an ePAPR-compliant loader.

>>>> The device method would break abstraction layers, 

Which abstraction layers?

>>>> it's much like
>>>> vmport stuff in x86. Using a hypercall would be a small improvement.
>>>> Instead it should be possible to implement a small boot ROM which puts
>>>> the secondary CPUs into managed halt state without spinning, then the
>>>> boot CPU could send an IPI to a halted CPU to wake them up based on
>>>> the spin table, just like real HW would do.

The spin table, with no IPI or halt state, is what real HW does (or
rather, what software does on real HW) today.  It's ugly and inefficient
but it should work everywhere.  Anything else would be dependent on a
specific HW implementation.

>>>> On Sparc32 OpenBIOS this
>>>> is something like a few lines of ASM on both sides.
>>>
>>> That sounds pretty close to what I had implemented in v1. Back then the 
>>> only comment was to do it using this method from Scott.

I had some comments on the actual v1 implementation as well. :-)

>>> So we have the choice between having code inside the guest that
>>> spins, maybe even only checks every x ms, by programming a timer,
>>> or we can try to make an event out of the memory write. V1 was
>>> the former, v2 (this one) is the latter. This version performs a
>>> lot better and is easier to understand.
>>
>> The abstraction layers should not be broken lightly, I suppose some
>> performance or laziness^Wlocal optimization reasons were behind vmport
>> design too. The ideal way to solve this could be to detect a spinning
>> CPU and optimize that for all architectures, that could be tricky
>> though (if a CPU remains in the same TB for extended periods, inspect
>> the TB: if it performs a loop with a single load instruction, replace
>> the load by a special wait operation for any memory stores to that
>> page).

How's that going to work with KVM?

> In fact, the whole kernel loading way we go today is pretty much
> wrong. We should rather do it similar to OpenBIOS where firmware
> always loads and then pulls the kernel from QEMU using a PV
> interface. At that point, we would have to implement such an
> optimization as you suggest. Or implement a hypercall :). 

I think the current approach is more usable for most purposes.  If you
start U-Boot instead of a kernel, how do pass information on from the
user (kernel, rfs, etc)?  Require the user to create flash images[1]?
Maybe that's a useful mode of operation in some cases, but I don't think
we should be slavishly bound to it.  Think of the current approach as
something between whole-system and userspace emulation.

Where does the device tree come from?  How do you tell the guest about
what devices it has, especially in virtualization scenarios with non-PCI
passthrough devices, or custom qdev instantiations?

> But at least we'd always be running the same guest software stack.

No we wouldn't.  Any U-Boot that runs under QEMU would have to be
heavily modified, unless we want to implement a ton of random device
emulation, at least one extra memory translation layer (LAWs, localbus
windows, CCSRBAR, and such), hacks to allow locked cache lines to
operate despite a lack of backing store, etc.

-Scott

[1] Keep in mind that a major use case for e500 KVM is on host systems
that don't have a hard drive.  I want to *reduce* the amount of memory
we waste to store this stuff, not increase it.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]