[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Bug 1297651] [NEW] KVM create a win7 guest with Qemu,
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [Bug 1297651] [NEW] KVM create a win7 guest with Qemu, it boots up fail |
Date: |
Wed, 26 Mar 2014 16:50:01 +0200 |
On Wed, Mar 26, 2014 at 02:56:33PM +0100, Laszlo Ersek wrote:
> On 03/26/14 13:58, Michael S. Tsirkin wrote:
> > On Wed, Mar 26, 2014 at 01:28:02PM +0100, Laszlo Ersek wrote:
> >> On 03/26/14 11:31, Michael S. Tsirkin wrote:
> >>
> >>> On Wed, Mar 26, 2014 at 06:45:10AM -0000, Robert Hu wrote:
> >>
> >>>> Date: Mon Mar 17 17:05:16 2014 +0100
> >>>>
> >>>> i386/acpi-build: allow more than 255 elements in CPON
> >>>>
> >>>> The build_ssdt() function builds a number of AML objects that are
> >>>> related
> >>>> to CPU hotplug, and whose IDs form a contiguous sequence of APIC IDs.
> >>>> (APIC IDs are in fact discontiguous, but this is the traditional
> >>>> interface: build a contiguous sequence from zero up that covers all
> >>>> possible APIC IDs.) These objects are:
> >>>>
> >>>> - a Processor() object for each VCPU,
> >>>> - a NTFY method, with one branch for each VCPU,
> >>>> - a CPON package with one element (hotplug status byte) for each
> >>>> VCPU.
> >>>>
> >>>> The build_ssdt() function currently limits the *count* of processor
> >>>> objects, and NTFY branches, and CPON elements, in 0xFF (see the
> >>>> assignment
> >>>> to "acpi_cpus"). This allows for an inclusive APIC ID range of
> >>>> [0..254].
> >>>> This is incorrect, because the highest APIC ID that we otherwise
> >>>> allow a
> >>>> VCPU to take is 255.
> >>>>
> >>>> In order to extend the maximum count to 256, and the traversed APIC
> >>>> ID
> >>>> range correspondingly to [0..255]:
> >>>> - the Processor() objects need no change,
> >>>> - the NTFY method also needs no change,
> >>>> - the CPON package must be updated, because it is defined with a
> >>>> DefPackage, and the number of elements in such a package can be at
> >>>> most
> >>>> 255. We pick a DefVarPackage instead.
> >>>>
> >>>> We replace the Op byte, and the encoding of the number of elements.
> >>>> Compare:
> >>>>
> >>>> DefPackage := PackageOp PkgLength NumElements
> >>>> PackageElementList
> >>>> DefVarPackage := VarPackageOp PkgLength VarNumElements
> >>>> PackageElementList
> >>>>
> >>>> PackageOp := 0x12
> >>>> VarPackageOp := 0x13
> >>>
> >>>
> >>> I think I know what's going on here: the specification says:
> >>>
> >>> The ASL compiler can emit two different AML opcodes for a Package
> >>> declaration, either PackageOp or VarPackageOp. For small,
> >>> fixed-length packages, the PackageOp is used and this
> >>>
> >>> opcode is compatible with ACPI 1.0. A VarPackageOp will be emitted
> >>> if any of the following conditions are true:
> >>> *
> >>> The NumElements argument is a TermArg that can only be resolved at
> >>> runtime.
> >>> *
> >>> At compile time, NumElements resolves to a constant that is larger
> >>> than 55.
> >>> *
> >>> The PackageList contains more than 255 initializer elements.
> >>>
> >>>
> >>> So we clearly violate this rule.
> >>
> >> I did see this passage of the spec, but it is not relevant. It is
> >> about what the ASL compiler does. It comes from:
> >>
> >> 19 ACPI Source Language (ASL)Reference
> >> 19.5 ASL Operator Reference
> >> 19.5.98 Package (Declare Package Object)
> >>
> >> We do not have an ASL compiler at hand.
> >
> > True. But I think the spec and guests simply didn't envision writing
> > AML by hand :)
>
> I sort of disagree. The spec has an entire chapter dedicated to AML. If
> the restriction were mentioned in the AML chapter, I'd agree. (Of course
> it *could* be in fact mentioned there, just with me not knowing!)
Maybe. It's more a question of what guests implement than
what the spec says, though.
> >
> >> The specification nowhere restricts VarPackageOp to > 255.
> >>
> >> However, what I *did* mess up is compatibility with ACPI 1.0. Just
> >> below the quoted part, there's also this:
> >>
> >> Note: The ability to create variable-sized packages was first
> >> introduced in ACPI 2.0. ACPI 1.0 only allowed fixed-size
> >> packages with up to 255 elements.
> >>
> >> I forgot that the header of the containing table stated the ACPI
> >> version:
> >>
> >> /* Copy header and patch values in the S3_ / S4_ / S5_ packages */
> >> ssdt_ptr = acpi_data_push(table_data, sizeof(ssdp_misc_aml));
> >>
> >> and
> >>
> >> DefinitionBlock ("ssdt-misc.aml", "SSDT", 0x01, "BXPC", "BXSSDTSUSP",
> >> 0x1)
> >> ^^^^
> >> ComplianceRevision
> >>
> >> So my patch generates ACPI 2.0+ contents for an 1.0 table.
> >>
> >>> The following seems to fix the issue - still testing. Can you
> >>> confirm please?
> >>
> >> This patch only restricts the bug to a subset of cases, but it
> >> doesn't fix it.
> >>
> >>> However the question we should ask is whether
> >>> it's a good idea to allow hotplug ID values that might
> >>> make guests fail to boot.
> >>>
> >>> How about limiting ACPI_CPU_HOTPLUG_ID_LIMIT to 255?
> >>
> >> I think it's not the package size / APIC ID value per se that breaks
> >> the boot, but the incompatibility between the ACPI revision stated in
> >> the SSDT header, and the construct in the table.
> >
> >
> > It would be interesting to try tweaking the table version and seeing
> > what happens. Does it help any guests?
>
> Maybe Robert can try this patch:
This won't affect the tables exposed to guests.
> ------------[snip]------------
> diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
> index 0a1e252..6294da5 100644
> --- a/hw/i386/acpi-dsdt.dsl
> +++ b/hw/i386/acpi-dsdt.dsl
> @@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
> DefinitionBlock (
> "acpi-dsdt.aml", // Output Filename
> "DSDT", // Signature
> - 0x01, // DSDT Compliance Revision
> + 0x02, // DSDT Compliance Revision
> "BXPC", // OEMID
> "BXDSDT", // TABLE ID
> 0x1 // OEM Revision
> diff --git a/hw/i386/q35-acpi-dsdt.dsl b/hw/i386/q35-acpi-dsdt.dsl
> index f4d2a2d..a728e7f 100644
> --- a/hw/i386/q35-acpi-dsdt.dsl
> +++ b/hw/i386/q35-acpi-dsdt.dsl
> @@ -27,7 +27,7 @@ ACPI_EXTRACT_ALL_CODE Q35AcpiDsdtAmlCode
> DefinitionBlock (
> "q35-acpi-dsdt.aml",// Output Filename
> "DSDT", // Signature
> - 0x01, // DSDT Compliance Revision
> + 0x02, // DSDT Compliance Revision
> "BXPC", // OEMID
> "BXDSDT", // TABLE ID
> 0x2 // OEM Revision
> diff --git a/hw/i386/ssdt-misc.dsl b/hw/i386/ssdt-misc.dsl
> index a4484b8..f284f34 100644
> --- a/hw/i386/ssdt-misc.dsl
> +++ b/hw/i386/ssdt-misc.dsl
> @@ -15,7 +15,7 @@
>
> ACPI_EXTRACT_ALL_CODE ssdp_misc_aml
>
> -DefinitionBlock ("ssdt-misc.aml", "SSDT", 0x01, "BXPC", "BXSSDTSUSP", 0x1)
> +DefinitionBlock ("ssdt-misc.aml", "SSDT", 0x02, "BXPC", "BXSSDTSUSP", 0x1)
> {
>
> /****************************************************************
> diff --git a/hw/i386/ssdt-pcihp.dsl b/hw/i386/ssdt-pcihp.dsl
> index ac91c05..80d4b9a 100644
> --- a/hw/i386/ssdt-pcihp.dsl
> +++ b/hw/i386/ssdt-pcihp.dsl
> @@ -15,7 +15,7 @@
>
> ACPI_EXTRACT_ALL_CODE ssdp_pcihp_aml
>
> -DefinitionBlock ("ssdt-pcihp.aml", "SSDT", 0x01, "BXPC", "BXSSDTPCIHP", 0x1)
> +DefinitionBlock ("ssdt-pcihp.aml", "SSDT", 0x02, "BXPC", "BXSSDTPCIHP", 0x1)
> {
>
> /****************************************************************
> diff --git a/hw/i386/ssdt-proc.dsl b/hw/i386/ssdt-proc.dsl
> index 8229bfd..e8a43d6 100644
> --- a/hw/i386/ssdt-proc.dsl
> +++ b/hw/i386/ssdt-proc.dsl
> @@ -32,7 +32,7 @@
>
> ACPI_EXTRACT_ALL_CODE ssdp_proc_aml
>
> -DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1)
> +DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x02, "BXPC", "BXSSDT", 0x1)
> {
> ACPI_EXTRACT_PROCESSOR_START ssdt_proc_start
> ACPI_EXTRACT_PROCESSOR_END ssdt_proc_end
> ------------[snip]------------
>
> >>>
> >>> We never allowed > 255 in the past, is it worth the
> >>> maintainance headaches?
> >>>
> >>>
> >>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >>> index f1054dd..7597517 100644
> >>> --- a/hw/i386/acpi-build.c
> >>> +++ b/hw/i386/acpi-build.c
> >>> @@ -1055,9 +1055,21 @@ build_ssdt(GArray *table_data, GArray *linker,
> >>>
> >>> {
> >>> GArray *package = build_alloc_array();
> >>> - uint8_t op = 0x13; /* VarPackageOp */
> >>> + uint8_t op;
> >>> +
> >>> + /*
> >>> + * Note: The ability to create variable-sized packages was
> >>> first introduced in ACPI 2.0. ACPI 1.0 only
> >>> + * allowed fixed-size packages with up to 255 elements.
> >>> + * Windows guests up to win2k8 fail when VarPackageOp is
> >>> used.
> >>> + */
> >>> + if (acpi_cpus <= 255) {
> >>> + op = 0x12; /* PackageOp */
> >>> + build_append_byte(package, acpi_cpus); /* NumElements */
> >>> + } else {
> >>> + op = 0x13; /* VarPackageOp */
> >>> + build_append_int(package, acpi_cpus); /* VarNumElements
> >>> */
> >>> + }
> >>>
> >>> - build_append_int(package, acpi_cpus); /* VarNumElements */
> >>> for (i = 0; i < acpi_cpus; i++) {
> >>> uint8_t b = test_bit(i, cpu->found_cpus) ? 0x01 : 0x00;
> >>> build_append_byte(package, b);
> >>>
> >>
> >> The patch will mask the problem for most of the cases, but when
> >> VarPackageOp is used, it will be broken just the same (because the
> >> ACPI revision in the SSDT header will still mismatch).
> >
> > yes but modern guests don't seem to care,
>
> I disagree. I think this is exactly what causes the Windows boot
> problem.
Yes but
1. more modern guests do work fine
2. I tried changing revision to 2 and it doesn't help windows 7 guests
even though they are supposed to support ACPI 2.0.
> > and it was already broken in
> > 1.7, wasn't it?
>
> No, I don't think so. The ACPI revision in the SSDT table header stated
> ACPI 1.0 just the same, and the PackageOp + NumElements AML code was
> fully compliant with that ACPI spec revision.
>
> (Or else I'm not getting what you're getting at.)
Can we in fact create a setup where that package has more than 255
elements?
If yes, the code in 1.7 is broken since it will discard high bytes from
the number, result will be NumElements = 0.
If no - your patch wasn't needed and I could have reverted it
instead of adding more code like I did: the branch > 255 is simply
never reached.
> >
> >> Here's my proposal:
> >> - I can post a patch that changes the SSDT DSL files, *and* the DSDT
> >> files (q35-acpi-dsdt.dsl acpi-dsdt.dsl), to state an ACPI revision
> >> of 2.0. (The ACPI revision of the DSDT file determines integer
> >> sizes for SSDTs too, so we can't just bump the SSDTs to 2.0)
> >
> > It should not be a problem but I'm not sure I get this comment. Can
> > you explain?
>
> See
>
> 19.5.28 DefinitionBlock (Declare Definition Block)
>
> in the DSL chapter:
>
> Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> width of Integer objects is dependent on the ComplianceRevision of the
> DSDT. If the ComplianceRevision is less than 2, all integers are
> restricted to 32 bits. Otherwise, full 64-bit integers are used. The
> version of the DSDT sets the global integer width for all integers,
> including integers in SSDTs.
>
> So, the ACPI revision in the DefinitionBlock()s must be kept in sync
> between DSDT and SSDT.
That's not what it says. It just says you must not use
64 bit integers in your code if you set DSDT revision to 1.
> >
> >> - Or I suggest to revert my patches for 2.0.
> >>
> >> You probably won't like bumping the ACPI rev in DSDT/SSDT, for
> >> various compatibility reasons, so I think I suggest to revert these
> >> two patches of mine. It's now clear to me that this VCPU hotplug
> >> limit cannot be lifted without much more intrusive (and guest
> >> visible) changes. Sorry about missing this fact in my original
> >> submission.
> >>
> >> Thanks,
> >> Laszlo
> >
> > I have a problem with both approaches :)
> >
> > If we want to change ACPI rev, I think we should do this
> > conditionally when max_cpus > 255.
> > Would be worth it if this fixes some guests.
>
> Two problems, one small, one bigger:
> - small: you'd have to patch the table header dynamically
We do this anyway, for all tables.
> - bigger: ACPI revision stated in the DefinitionBlock() operator of the
> DSL (ie. human readable source) might have an effect on the AML
> generated by iasl. So if you compile the DSL for ACPI 1.0 with iasl,
> then hot-patch the ACPI revision to 2.0 in the AML, some ACPI parsers
> might find problems with the AML that has been compiled for 1.0, but
> now has to be interpreted for 2.0.
It's a theoretical problem: in practice I didn't notice any changes.
> >
> > As for reverting, I think it's a problem that we seem to
> > allow max_cpus = 256 but then it doesn't really work.
>
> It's not about max_cpus. Let me rephrase your statement:
>
> As for reverting, I think it's a problem that we seem to allow a VCPU
> with an APIC ID of 255 (which can occur dependent on VCPU topology,
> and is only indirectly related to max_vcpus), but then hotplugging the
> VCPU with APIC ID == 255 doesn't really work.
>
> And yes, this is exactly the bug that my patches tried to fix.
OK so I'd like to figure out how to trigger this configuration.
> >
> >
> >
> > I think the patch I posted might be good enough for 2.0.
> > It seems to make things work for new guests, and old guests
> > work as long as you don't specify max_cpus = 255.
> > The config with a high max_cpus value never really worked so
> > not a big deal.
> >
> >
> > Alternatively limit max_cpus to 255, to make it fail cleanly.
> >
>
> Again, it's not about max_cpus (it's more about topology). But, you
> could be right; your patch would work as a stop-gap.
>
> Thanks,
> Laszlo
Yes, we might need to revisit for 2.1.
--
MST