qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edk2-devel] A problem with live migration of UEFI virtual machines


From: Laszlo Ersek
Subject: Re: [edk2-devel] A problem with live migration of UEFI virtual machines
Date: Tue, 25 Feb 2020 21:40:00 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

Hi Andrew,

On 02/25/20 19:56, Andrew Fish wrote:
> Laszlo,
> 
> If I understand this correctly is it not more complicated than just size. It 
> also assumes the memory layout is the same?

Yes.

> The legacy BIOS used fixed magic address ranges, but UEFI uses dynamically 
> allocated memory so addresses are not fixed. While the UEFI firmware does try 
> to keep S3 and S4 layouts consistent between boots, I'm not aware of any 
> mechanism to keep the memory map address the same between versions of the 
> firmware?

It's not about RAM, but platform MMIO.

The core of the issue here is that the -D FD_SIZE_4MB and -D FD_SIZE_2MB
build options (or more directly, the different FD_SIZE_IN_KB macro
settings) set a bunch of flash-related build-time constant macros, and
PCDs, differently, in the following files:

- OvmfPkg/OvmfPkg.fdf.inc
- OvmfPkg/VarStore.fdf.inc
- OvmfPkg/OvmfPkg*.dsc

As a result, the OVMF_CODE.fd firmware binary will have different
hard-coded references to the variable store pflash addresses.
(Guest-physical MMIO addresses that point into the pflash range.)

If someone tries to combine an OVMF_CODE.fd firmware binary from e.g.
the 4MB build, with a variable store file that was originally
instantiated from an OVMF_VARS.fd varstore template from the 2MB build,
then the firmware binary's physical address references and various size
references will not match the contents / layout of the varstore pflash
chip, which maps an incompatibly structured varstore file.

For example, "OvmfPkg/VarStore.fdf.inc" describes two incompatible
EFI_FIRMWARE_VOLUME_HEADER structures (which "build" generates for the
OVMF_VARS.fd template) between the 4MB (total size) build, and the
1MB/2MB (total size) build.

The commit message below summarizes the internal layout differences,
from 1MB/2MB -> 4MB:

https://github.com/tianocore/edk2/commit/b24fca05751f

Excerpt (relevant for OVMF_VARS.fd):

  Description                Compression type                Size [KB]
  -------------------------  -----------------  ----------------------
  Non-volatile data storage  open-coded binary    128 ->   528 ( +400)
                               data
    Variable store                                 56 ->   256 ( +200)
    Event log                                       4 ->     4 (   +0)
    Working block                                   4 ->     4 (   +0)
    Spare area                                     64 ->   264 ( +200)

Thanks
Laszlo


>> On Feb 25, 2020, at 9:53 AM, Laszlo Ersek <address@hidden> wrote:
>>
>> On 02/24/20 16:28, Daniel P. Berrangé wrote:
>>> On Tue, Feb 11, 2020 at 05:39:59PM +0000, Alex Bennée wrote:
>>>>
>>>> wuchenye1995 <address@hidden> writes:
>>>>
>>>>> Hi all,
>>>>>   We found a problem with live migration of UEFI virtual machines
>>>>>   due to size of OVMF.fd changes.
>>>>>   Specifically, the size of OVMF.fd in edk with low version such as
>>>>>   edk-2.0-25 is 2MB while the size of it in higher version such as
>>>>>   edk-2.0-30 is 4MB.
>>>>>   When we migrate a UEFI virtual machine from the host with low
>>>>>   version of edk2 to the host with higher one, qemu component will
>>>>>   report an error in function qemu_ram_resize while
>>>>> checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in
>>>>> != 0x400000: Invalid argument.
>>>>>   We want to know how to solve this problem after updating the
>>>>>   version of edk2.
>>>>
>>>> You can only migrate a machine that is identical - so instantiating a
>>>> empty machine with a different EDK image is bound to cause a problem
>>>> because the machines don't match.
>>>
>>> I don't believe we are that strict for firmware in general. The
>>> firmware is loaded when QEMU starts, but that only matters for the
>>> original source host QEMU. During migration, the memory content of the
>>> original firmware will be copied during live migration, overwriting
>>> whatever the target QEMU loaded off disk. This works....provided the
>>> memory region is the same size on source & target host, which is where
>>> the problem arises in this case.
>>>
>>> If there's a risk that newer firmware will be larger than old firmware
>>> there's only really two options:
>>>
>>>  - Keep all firmware images forever, each with a unique versioned
>>>    filename. This ensures target QEMU will always load the original
>>>    smaller firmware
>>>
>>>  - Add padding to the firmware images. IOW, if the firmware is 2 MB,
>>>    add zero-padding to the end of the image to round it upto 4 MB
>>>    (whatever you anticipate the largest size wil be in future).
>>>
>>> Distros have often taken the latter approach for QEMU firmware in the
>>> past. The main issue is that you have to plan ahead of time and get
>>> this padding right from the very start. You can't add the padding
>>> after the fact on an existing VM.
>>
>> Following up here *too*, just for completeness.
>>
>> The query in this thread has been posted three times now (and I have
>> zero idea why). Each time it generated a different set of responses. For
>> completes, I'm now going to link the other two threads here (because the
>> present thread seems to have gotten the most feedback).
>>
>> To the OP:
>>
>> - please do *NOT* repost the same question once you get an answer. It
>>  only fragments the discussion and creates confusion. It also doesn't
>>  hurt if you *confirm* that you understood the answer.
>>
>> - Yet further, if your email address has @gmail.com for domain, but your
>>  msgids contain "tencent", that raises some eyebrows (mine for sure).
>>  You say "we" in the query, but never identify the organization behind
>>  the plural pronoun.
>>
>> (I've been fuming about the triple-posting of the question for a while
>> now, but it's only now that, upon seeing how much work Dan has put into
>> his answer, I've decided that dishing out a bit of netiquette would be
>> in order.)
>>
>> * First posting:
>> - msgid:      <address@hidden <mailto:address@hidden>>
>> - edk2-devel: https://edk2.groups.io/g/devel/message/54146 
>> <https://edk2.groups.io/g/devel/message/54146>
>> - qemu-devel: 
>> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html 
>> <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html>
>>
>>  * my response:
>>    - msgid:      <address@hidden <mailto:address@hidden>>
>>    - edk2-devel: https://edk2.groups.io/g/devel/message/54161 
>> <https://edk2.groups.io/g/devel/message/54161>
>>    - qemu-devel: none, because (as an exception) I used the stupid
>>                  groups.io <http://groups.io/> web interface to respond, and 
>> so my response
>>                  never reached qemu-devel
>>
>> * Second posting (~4 hours after the first)
>> - msgid:      <address@hidden <mailto:address@hidden>>
>> - edk2-devel: https://edk2.groups.io/g/devel/message/54147 
>> <https://edk2.groups.io/g/devel/message/54147>
>> - qemu-devel: 
>> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html 
>> <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html>
>>
>>  * Dave's response:
>>    - msgid:      <20200220154742.GC2882@work-vm>
>>    - edk2-devel: https://edk2.groups.io/g/devel/message/54681 
>> <https://edk2.groups.io/g/devel/message/54681>
>>    - qemu-devel: 
>> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html 
>> <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html>
>>
>> * Third posting (next day, present thread) -- cross posted to yet
>>  another list (!), because apparently Dave's feedback and mine had not
>>  been enough:
>> - msgid:        <address@hidden <mailto:address@hidden>>
>> - edk2-devel:   https://edk2.groups.io/g/devel/message/54220 
>> <https://edk2.groups.io/g/devel/message/54220>
>> - edk2-discuss: https://edk2.groups.io/g/discuss/message/135 
>> <https://edk2.groups.io/g/discuss/message/135>
>> - qemu-devel:   
>> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html 
>> <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html>
>>
>> Back on topic: see my response again. The answer is, you can't solve the
>> problem (specifically with OVMF), and QEMU in fact does you service by
>> preventing the migration.
>>
>> Laszlo
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Groups.io <http://groups.io/> Links: You receive all messages sent to this 
>> group.
>>
>> View/Reply Online (#54792): https://edk2.groups.io/g/devel/message/54792 
>> <https://edk2.groups.io/g/devel/message/54792>
>> Mute This Topic: https://groups.io/mt/71141681/1755084 
>> <https://groups.io/mt/71141681/1755084>
>> Group Owner: address@hidden <mailto:address@hidden>
>> Unsubscribe: https://edk2.groups.io/g/devel/unsub 
>> <https://edk2.groups.io/g/devel/unsub>  [address@hidden 
>> <mailto:address@hidden>]
>> -=-=-=-=-=-=-=-=-=-=-=-
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]