[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] virtio-net regression [was: syslinux vs. OVMF]
From: |
Laszlo Ersek |
Subject: |
Re: [Qemu-devel] virtio-net regression [was: syslinux vs. OVMF] |
Date: |
Fri, 10 Apr 2015 16:36:15 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
On 04/10/15 13:04, Laszlo Ersek wrote:
> On 04/10/15 12:06, Laszlo Ersek wrote:
>> On 04/10/15 10:14, Gerd Hoffmann wrote:
>>> Hi,
>>>
>>>> In summary, please ask Gerd to rebuild the ipxe binaries that are
>>>> bundled with upstream qemu such that they include those two iPXE patches
>>>> of ours (see the last reference).
>>>
>>> https://www.kraxel.org/cgit/qemu/log/?h=rebase/roms-next
>>>
>>> Can you give this a try?
>>
>> Thank you for this update, I tested it.
>>
>> (1) I reproduced the issue, so that I could be sure that the fix wasn't
>> meaningless. Indeed the bug reproduces with the iPXE binaries bundled
>> with upstream qemu.
>>
>> I then checked out, built and installed your branch, and tried again,
>> with virtio-net and then e1000.
>>
>> (2) Virito-net results:
>> - OVMF loads shim.efi via network
>> - shim.efi loads grubx64.efi via network
>> - grubx64.efi loads grub.cfg via network
>> - grubx64.efi loads vmlinuz via network
>>
>> However, while grubx64.efi loads initrd.img via the network, qemu
>> crashes the guest, with the following message:
>>
>> qemu-system-x86_64: Guest moved used index from 46499 to 65534
>>
>> This is a virtio protocol bug in the guest (efi-virtio.rom), *or* in
>> QEMU. I don't know.
>>
>> * e1000 results:
>> - OVMF loads shim.efi via network
>> - shim.efi loads grubx64.efi via network
>> - grubx64.efi loads grub.cfg via network
>> - grubx64.efi loads vmlinuz via network
>> - grubx64.efi loads initrd.img via network
>> - guest kernel boots
>>
>> So, I think the update is fine in general; but maybe there's a new
>> virtio-related bug in either "efi-virtio.rom" or in QEMU.
>>
>> (When I originally wrote the (earlier versions of the) patches, I tested
>> them with virtio-net using RHEL-7 qemu, so I guess this could be an
>> upstream QEMU regression. The machine type I used for testing was
>> pc-i440fx-2.3.)
>>
>> (3) ... Confirmed, this is a qemu regression. Namely, I checked your new
>> efi-virtio.rom with RHEL-7 qemu, and it works fine. CC'ing qemu-devel.
>
> Small update, before I start bisecting it: the bug does not reproduce
> with "-netdev bridge".
>
> It seems to be specific to "-netdev tap". Further, "vhost=on" seems to
> play no role, "-netdev tap" reproduces the error both with and without
> vhost=on.
This is creepy.
It was not easy to bisect, because machine type "pc-i440fx-2.3" is obviously
not available in eg. v2.2.0.
Ultimately I realized that machine type pc-i440fx-2.0 does not reproduce the
error, even with current master.
So I picked machine type pc-i440fx-2.1, and bisected the interval between the
introduction of "pc-i440fx-2.1" (commit 3458b2b0) and current master (commit
6a460ed1). Log attached.
The result makes me question my sanity, or at least that I issued the correct
"git bisect bad" and "git bisect good" commands. This is the culprit:
commit 18045fb9f457a0f0cba2bd113c748a2dcb4ed39e
Author: Paolo Bonzini <address@hidden>
Date: Mon Jul 28 17:34:16 2014 +0200
pc: future-proof migration-compatibility of ACPI tables
This patch avoids that similar changes break QEMU again in the future.
QEMU will now hard-code 64k as the maximum ACPI table size, which
(despite being an order of magnitude smaller than 640k) should be enough
for everyone.
Reviewed-by: Laszlo Ersek <address@hidden>
Tested-by: Igor Mammedov <address@hidden>
Signed-off-by: Paolo Bonzini <address@hidden>
Reviewed-by: Michael S. Tsirkin <address@hidden>
Signed-off-by: Michael S. Tsirkin <address@hidden>
How?!
Anyway, then I patched qemu, on top of current master, still sticking with
machine type "pc-i440fx-2.1", as follows:
-----------
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 1fe7bfb..6cb00a2 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -344,6 +344,7 @@ static void pc_compat_2_1(MachineState *machine)
x86_cpu_compat_set_features("core2duo", FEAT_1_ECX, CPUID_EXT_VMX, 0);
x86_cpu_compat_kvm_no_autodisable(FEAT_8000_0001_ECX, CPUID_EXT3_SVM);
pcms->enforce_aligned_dimm = false;
+ legacy_acpi_table_size = 6652;
}
static void pc_compat_2_0(MachineState *machine)
-----------
Incredibly, this made the crash go away.
Without this patch (ie. when it crashes), the fw_cfg file called
"etc/acpi/tables" has size 0x20000. With the patch (which happens to suppress
the crash for some reason), the same fw_cfg file has size 0x2000 (1/16th). This
is consistent with the branches in acpi_build(). (Note that the warning block
visible there, in the second branch, is never printed.)
It seems very unlikely that qemu is doing anything wrong. The difference in the
fw_cfg file size causes a differently sized memory allocation in OVMF, which
displaces further allocations by 1 page (4KB). For example, "1af41000.efi" (the
iPXE virtio-net driver) is also loaded 4KB higher than before. But that doesn't
directly explain why grub places garbage in the virtio-net ring while it
downloads "initrd.img".
Anyway I think we can rule out any qemu regression at this point. It's a bug in
some other component that the different memory map (due to the larger, 0x20000
allocation) exposes.
Thanks,
Laszlo
bisect.log
Description: Text Data