qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] virtio-net regression [was: syslinux vs. OVMF]


From: Laszlo Ersek
Subject: Re: [Qemu-devel] virtio-net regression [was: syslinux vs. OVMF]
Date: Fri, 10 Apr 2015 16:36:15 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

On 04/10/15 13:04, Laszlo Ersek wrote:
> On 04/10/15 12:06, Laszlo Ersek wrote:
>> On 04/10/15 10:14, Gerd Hoffmann wrote:
>>>   Hi,
>>>
>>>> In summary, please ask Gerd to rebuild the ipxe binaries that are
>>>> bundled with upstream qemu such that they include those two iPXE patches
>>>> of ours (see the last reference).
>>>
>>> https://www.kraxel.org/cgit/qemu/log/?h=rebase/roms-next
>>>
>>> Can you give this a try?
>>
>> Thank you for this update, I tested it.
>>
>> (1) I reproduced the issue, so that I could be sure that the fix wasn't
>> meaningless. Indeed the bug reproduces with the iPXE binaries bundled
>> with upstream qemu.
>>
>> I then checked out, built and installed your branch, and tried again,
>> with virtio-net and then e1000.
>>
>> (2) Virito-net results:
>> - OVMF        loads shim.efi    via network
>> - shim.efi    loads grubx64.efi via network
>> - grubx64.efi loads grub.cfg    via network
>> - grubx64.efi loads vmlinuz     via network
>>
>> However, while grubx64.efi loads initrd.img via the network, qemu
>> crashes the guest, with the following message:
>>
>> qemu-system-x86_64: Guest moved used index from 46499 to 65534
>>
>> This is a virtio protocol bug in the guest (efi-virtio.rom), *or* in
>> QEMU. I don't know.
>>
>> * e1000 results:
>> - OVMF        loads shim.efi    via network
>> - shim.efi    loads grubx64.efi via network
>> - grubx64.efi loads grub.cfg    via network
>> - grubx64.efi loads vmlinuz     via network
>> - grubx64.efi loads initrd.img  via network
>> - guest kernel boots
>>
>> So, I think the update is fine in general; but maybe there's a new
>> virtio-related bug in either "efi-virtio.rom" or in QEMU.
>>
>> (When I originally wrote the (earlier versions of the) patches, I tested
>> them with virtio-net using RHEL-7 qemu, so I guess this could be an
>> upstream QEMU regression. The machine type I used for testing was
>> pc-i440fx-2.3.)
>>
>> (3) ... Confirmed, this is a qemu regression. Namely, I checked your new
>> efi-virtio.rom with RHEL-7 qemu, and it works fine. CC'ing qemu-devel.
> 
> Small update, before I start bisecting it: the bug does not reproduce
> with "-netdev bridge".
> 
> It seems to be specific to "-netdev tap". Further, "vhost=on" seems to
> play no role, "-netdev tap" reproduces the error both with and without
> vhost=on.

This is creepy.

It was not easy to bisect, because machine type "pc-i440fx-2.3" is obviously 
not available in eg. v2.2.0.

Ultimately I realized that machine type pc-i440fx-2.0 does not reproduce the 
error, even with current master.

So I picked machine type pc-i440fx-2.1, and bisected the interval between the 
introduction of "pc-i440fx-2.1" (commit 3458b2b0) and current master (commit 
6a460ed1). Log attached.

The result makes me question my sanity, or at least that I issued the correct 
"git bisect bad" and "git bisect good" commands. This is the culprit:

commit 18045fb9f457a0f0cba2bd113c748a2dcb4ed39e
Author: Paolo Bonzini <address@hidden>
Date:   Mon Jul 28 17:34:16 2014 +0200

    pc: future-proof migration-compatibility of ACPI tables
    
    This patch avoids that similar changes break QEMU again in the future.
    QEMU will now hard-code 64k as the maximum ACPI table size, which
    (despite being an order of magnitude smaller than 640k) should be enough
    for everyone.
    
    Reviewed-by: Laszlo Ersek <address@hidden>
    Tested-by: Igor Mammedov <address@hidden>
    Signed-off-by: Paolo Bonzini <address@hidden>
    Reviewed-by: Michael S. Tsirkin <address@hidden>
    Signed-off-by: Michael S. Tsirkin <address@hidden>

How?!

Anyway, then I patched qemu, on top of current master, still sticking with 
machine type "pc-i440fx-2.1", as follows:

-----------
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 1fe7bfb..6cb00a2 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -344,6 +344,7 @@ static void pc_compat_2_1(MachineState *machine)
     x86_cpu_compat_set_features("core2duo", FEAT_1_ECX, CPUID_EXT_VMX, 0);
     x86_cpu_compat_kvm_no_autodisable(FEAT_8000_0001_ECX, CPUID_EXT3_SVM);
     pcms->enforce_aligned_dimm = false;
+    legacy_acpi_table_size = 6652;
 }
 
 static void pc_compat_2_0(MachineState *machine)
-----------

Incredibly, this made the crash go away.

Without this patch (ie. when it crashes), the fw_cfg file called 
"etc/acpi/tables" has size 0x20000. With the patch (which happens to suppress 
the crash for some reason), the same fw_cfg file has size 0x2000 (1/16th). This 
is consistent with the branches in acpi_build(). (Note that the warning block 
visible there, in the second branch, is never printed.)

It seems very unlikely that qemu is doing anything wrong. The difference in the 
fw_cfg file size causes a differently sized memory allocation in OVMF, which 
displaces further allocations by 1 page (4KB). For example, "1af41000.efi" (the 
iPXE virtio-net driver) is also loaded 4KB higher than before. But that doesn't 
directly explain why grub places garbage in the virtio-net ring while it 
downloads "initrd.img".

Anyway I think we can rule out any qemu regression at this point. It's a bug in 
some other component that the different memory map (due to the larger, 0x20000 
allocation) exposes.

Thanks,
Laszlo

Attachment: bisect.log
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]