qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] nvram and boot order


From: Alexander Graf
Subject: Re: [Qemu-devel] nvram and boot order
Date: Fri, 19 Oct 2012 10:40:45 +0200


On 19.10.2012, at 10:24, David Gibson <address@hidden> wrote:

> On Thu, Oct 18, 2012 at 08:32:54AM +0200, Alexander Graf wrote:
>> 
>> 
>> On 18.10.2012, at 03:18, Benjamin Herrenschmidt <address@hidden> wrote:
>> 
>>> On Thu, 2012-10-18 at 11:09 +1100, David Gibson wrote:
>>> 
>>>>>> That's horrible; if you use -boot just once it will clobber a
>>>>>> persistent NVRAM's boot order.  I see that a means of changing the
>>>>>> default boot order from management tools is desirable, but that
>>>>>> shouldn't be the normal behaviour of -boot.  And the objections to (2)
>>>>>> apply even more strongly - we'd need to translate arbitrary -boot
>>>>>> strings to NVRAM representation which may not be at all
>>>>>> straightforward from the information qemu has available.
>>>>> 
>>>>> It may not be straight forward, but it's what makes the most sense from
>>>>> a user's PoV.
>>>> 
>>>> Bollocks.  Using -boot to override the normal boot sequence
>>>> permanently changing the normal boot sequence absoultely does not make
>>>> sense from a user's PoV.
>>> 
>>> I strongly agree with David here. -boot should not change the persistent
>>> state.
>> 
>> I think Anthony and you are looking at 2 different use cases, each
>> with their own sane reasoning.
>> 
>> You want to have the chance to override the boot order temporarily
>> for things like cd boot or quick guest rescue missions.
>> 
>> You also want to be able to permanently change the guest's boot
>> order from a management tool. At that same place you want to be able
>> to display it, so you don't have to boot your vm to know what it
>> would be doing.
> 
> That's true to an extent.  However, I vehemently disagree that it's
> arbitrary which one gets the new option.  Neither -boot nor bootindex=
> alter any persistent data now and they should not suddenly start doing
> so.
> 
> Now a method of externally altering the firmware persistent boot order
> would certainly be nice to have.  However, I'm not at all convinced
> that it's realistically possible to do that in way that has a platform
> neutral interface.  The fundamental problem here is that we're tied to
> the pre-existing ways the platform stores the boot order information
> and what that's even capable of expressing can be very different from
> platform to platform: can it express an arbitrary list, or just a
> limited number of devices, or just one?  can it represent arbitrary
> devices in some firmware id/address scheme, or does it just
> give order of a fixed set of known devices?  or is it even more
> limited, containing just a few "CD before disk" type booleans?  for
> that matter, does the firmware even have any notion at all of a
> persistent configurable boot order?

You get 2 lists from machine specific code:

  - potentially available boot devices
  - current boot order list

Both lists contain a number of stringsy the mapping of those strings to 
platform specific data is responsibility of the platform. After all, the 
platform gave us the list of available devices, so it better accepts them in 
the boot order list.

Then you basically have to query the machine (with full device state populated, 
otherwise the available list isn't available) for the list of devices you're 
able to boot from. You also ask it for its current boot ordered list. An 
external tool can display both and alles you to add/remove entries from the 
current list and reshuffle them.

When you boot the VM you now habe to tell it to actually use that new boot 
order. This can hapen in 2 possible ways:

  - write it to a special section that is declared as temporary, taking 
precedence over the internal boot order
  - same as above, but with a flag indicating to SLOF that we want to persist it

Whether this information is passed through a special section in nvram or 
through fdt is an implementation detail to me. Nvram has the benefit of 
isolated commands you can execute:

  $ qemu -change-boot-order foo,bar
  <time passes>
  $ qemu

Would give you a VMwith the setzings you changed, while

  $ virsh change boot order foo,bar
  <time passes>
  $ qemu -override-boot-order foo,bar

Means management tools need to remember the boot order in internal state.

> 
> If the configuration tool/setting has to be platform specific anyway,
> then most of the questions the current proposal attempts to address
> simply don't arise.  We could make such a tool for pseries right now:
> access the persistent nvram image via qemu-nbd and poke the necessary
> things in.

we want to expose the same interface to the layers above.

Alex

> 
>> As for device detection logic, both face the same problems. You need
>> to be able to say 'boot from cd-rom first temporarily' just the same
>> as you need to be able to say 'boot from the first cd-rom as first
>> boot option permanently'. The permanent change needs to be possible
>> with the vm turned off though.
>> 
>> I suppose that Anthony's reasoning is that we can implement
>> temporary in the management layer (or even qemu) if we have the
>> permanent mechanism, by switching back to the previous state after
>> shutdown if the guest written boot order didn't change.
> 
> That really doesn't work, for the reason you mention in the next
> paragraph, amongst others.
> 
>> I don't mind personally if we have one interface for temporary and
>> persistent or 2 separate ones, but I think we should aim for having
>> both options available in the long run. Though doing permanent
>> changes first and reverting them later could raise problems when you
>> kill your vm, since that wouldn't clean up the temporary change.
> 
> Not to mention that the persistent store could be used for other
> things as well, and restoring it could clobber other changes that the
> guest has made and which should be persistent.
> 
>>> In our case, the persistent state will have been carefully crafted by
>>> complicated scripts by the distro installer, and while I may want to use
>>> -boot to boot once off a cd image or similar, I certainly don't want
>>> that to affect my nvram setting pointing to the right on-disk
>>> bootloader.
>>> 
>>> Additionally I don't want qemu to have to understand all the intricacies
>>> of expressing OFW boot path if we can avoid it.
>> 
>> Yes, the same problem as EFI for example is facing. The solution here is as 
>> simple as it gets: a new device name space. Instead of having a boot list 
>> entry saying 'boot from device x, part y, file z' you would get an entry 
>> saying 'boot from /qemu/disk0' and leave the rest to the firmware. The good 
>> thing about this approach is that it again is persistable and can be used in 
>> boot order lists. So you can directly translate -boot cd into 
>> /qemu/disk0,/qemu/cdrom. And if you screwed up your guest boot config, just 
>> put that order in by hand into permanent config.
>> 
>>> 
>>> Qemu gives as much info as it can and let the firmware itself inside the
>>> guest figure things out.
>> 
>> Yes, that's the only chance we have really. Even for bootindex, which could 
>> for example get translated to /qemu/pci/0.10.0/disk0 which again would then 
>> get aliased to the actual disk device node behind pci device 0.10.0 (first 
>> disk) by SLOF.
>> 
>>> 
>>> In fact, I don't want Qemu to know anything about our internal nvram
>>> format. This is a business between the guest FW and the guest OS. The
>>> only thing qemu is allowed to do is wipe it out if asked to do so :-)
>> 
>> It might be useful to use fdt in nvram to store the permanent boot order. 
>> That way QEMU / management tools have the chance to make persistent changes. 
>> Everyone around already understands fdt anyways :).
>> 
>>>> Um.. as far as I can tell that's a point in favour of my position.  It
>>>> makes it impossible for qemu to correctly describe boot sequences
>>>> using these devices in the terms firmware uses internally.  On the
>>>> other hand it certainly is possible for qemu to pass bootorder="cd"
>>>> (or whatever) to the firmware via device tree of fw_cfg and have
>>>> firmware locally interpret that in tersm of what it knows about
>>>> available devices.
>>> 
>>> This is more/less what happens with -boot today. IE. If you pass "c"
>>> SLOF looks for a bootable disk (though arguably the algorithm could be
>>> improved), "d" for a bootable optical media etc...
>>> 
>>> We definitely want something a bit more expressive and in some case
>>> might even be able to pass down from the command line a full path to an
>>> actual device but we don't necessarily want qemu to understand the nvram
>>> format of this.
>>> 
>>> Make it an expressive representation that makes sense to qemu, and let
>>> the FW "translate" that to something it understands internally.
>> 
>> Yes :).
>> 
>> Regardless of this problem, I think the conclusion on how to gandle default 
>> -boot makes sense to everyone, so you (Avik?) can already start working on 
>> that one while we nail down the details of the boot protocol handshakes 
>> between QEMU and SLOF.
>> 
>> 
>> Alex
>> 
> 
> -- 
> David Gibson            | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au    | minimalist, thank you.  NOT _the_ _other_
>                | _way_ _around_!
> http://www.ozlabs.org/~dgibson
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]