qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
Date: Thu, 11 Feb 2016 18:30:19 +0200

On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> On Tue, 9 Feb 2016 14:17:44 +0200
> "Michael S. Tsirkin" <address@hidden> wrote:
> 
> > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > > > So the linker interface solves this rather neatly:
> > > > bios allocates memory, bios passes memory map to guest.
> > > > Served us well for several years without need for extensions,
> > > > and it does solve the VM GEN ID problem, even though
> > > > 1. it was never designed for huge areas like nvdimm seems to want to use
> > > > 2. we might want to add a new 64 bit flag to avoid touching low memory  
> > > linker interface is fine for some readonly data, like ACPI tables
> > > especially fixed tables not so for AML ones is one wants to patch it.
> > > 
> > > However now when you want to use it for other purposes you start
> > > adding extensions and other guest->QEMU channels to communicate
> > > patching info back.
> > > It steals guest's memory which is also not nice and doesn't scale well.  
> > 
> > This is an argument I don't get. memory is memory. call it guest memory
> > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > but much slower.
> > 
> > ...
> It however matters for user, he pays for guest with XXX RAM but gets less
> than that. And that will be getting worse as a number of such devices
> increases.
> 
> > > > OK fine, but returning PCI BAR address to guest is wrong.
> > > > How about reading it from ACPI then? Is it really
> > > > broken unless there's *also* a driver?  
> > > I don't get question, MS Spec requires address (ADDR method),
> > > and it's read by ACPI (AML).  
> > 
> > You were unhappy about DMA into guest memory.
> > As a replacement for DMA, we could have AML read from
> > e.g. PCI and write into RAM.
> > This way we don't need to pass address to QEMU.
> That sounds better as it saves us from allocation of IO port
> and QEMU don't need to write into guest memory, the only question is
> if PCI_Config opregion would work with driver-less PCI device.

Or PCI BAR for that reason. I don't know for sure.

> 
> And it's still pretty much not test-able since it would require
> fully running OSPM to execute AML side.

AML is not testable, but that's nothing new.
You can test reading from PCI.

> > 
> > > As for working PCI_Config OpRegion without driver, I haven't tried,
> > > but I wouldn't be surprised if it doesn't, taking in account that
> > > MS introduced _DSM doesn't.
> > >   
> > > > 
> > > >   
> > > > > > >    Just compare with a graphics card design, where on device 
> > > > > > > memory
> > > > > > >    is mapped directly at some GPA not wasting RAM that guest could
> > > > > > >    use for other tasks.      
> > > > > > 
> > > > > > This might have been true 20 years ago.  Most modern cards do DMA.  
> > > > > >   
> > > > > 
> > > > > Modern cards, with it's own RAM, map its VRAM in address space 
> > > > > directly
> > > > > and allow users use it (GEM API). So they do not waste conventional 
> > > > > RAM.
> > > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in 
> > > > > this
> > > > > series (even PCI class id is the same)    
> > > > 
> > > > Don't know enough about graphics really, I'm not sure how these are
> > > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > > mostly use guest RAM, not on card RAM.
> > > >   
> > > > > > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > > > >    instead of consuming guest's RAM they should be mapped at
> > > > > > >    some GPA and their memory accessed directly.      
> > > > > > 
> > > > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > > > address. This breaks the straight-forward approach of using a
> > > > > > rebalanceable PCI BAR.    
> > > > > 
> > > > > For PCI rebalance to work on Windows, one has to provide working PCI 
> > > > > driver
> > > > > otherwise OS will ignore it when rebalancing happens and
> > > > > might map something else over ignored BAR.    
> > > > 
> > > > Does it disable the BAR then? Or just move it elsewhere?  
> > > it doesn't, it just blindly ignores BARs existence and maps BAR of
> > > another device with driver over it.  
> > 
> > Interesting. On classical PCI this is a forbidden configuration.
> > Maybe we do something that confuses windows?
> > Could you tell me how to reproduce this behaviour?
> #cat > t << EOF
> pci_update_mappings_del
> pci_update_mappings_add
> EOF
> 
> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
>  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
>  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
>  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> 
> wait till OS boots, note BARs programmed for ivshmem
>  in my case it was
>    01:01.0 0,0xfe800000+0x100
> then execute script and watch pci_update_mappings* trace events
> 
> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" 
> $i | nc -U /tmp/m; sleep 5; done;
> 
> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> and then programs new BARs, where:
>   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> creates overlapping BAR with ivshmem 


Thanks!
We need to figure this out because currently this does not
work properly (or maybe it works, but merely by chance).
Me and Marcel will play with this.

> 
> > 
> > > >   
> > > > > >     
> > > > > > >    In that case NVDIMM could even map whole label area and
> > > > > > >    significantly simplify QEMU<->OSPM protocol that currently
> > > > > > >    serializes that data through a 4K page.
> > > > > > >    There is also performance issue with buffer allocated in RAM,
> > > > > > >    because DMA adds unnecessary copying step when data could
> > > > > > >    be read/written directly of NVDIMM.
> > > > > > >    It might be no very important for _DSM interface but when it
> > > > > > >    comes to supporting block mode it can become an issue.      
> > > > > > 
> > > > > > So for NVDIMM, presumably it will have code access PCI BAR 
> > > > > > properly, so
> > > > > > it's guaranteed to work across BAR rebalancing.
> > > > > > Would that address the performance issue?    
> > > > > 
> > > > > it would if rebalancing were to account for driverless PCI device 
> > > > > BARs,
> > > > > but it doesn't hence such BARs need to be statically pinned
> > > > > at place where BIOS put them at start up.
> > > > > I'm also not sure that PCIConfig operation region would work
> > > > > on Windows without loaded driver (similar to _DSM case).
> > > > > 
> > > > >     
> > > > > > > Above points make ACPI patching approach not robust and fragile
> > > > > > > and hard to maintain.      
> > > > > > 
> > > > > > Wrt GEN ID these are all kind of subjective though.  I especially 
> > > > > > don't
> > > > > > get what appears your general dislike of the linker host/guest
> > > > > > interface.    
> > > > > Besides technical issues general dislike is just what I've written
> > > > > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > > > > 
> > > > > to make it less fragile:
> > > > >  1. it should be impossible to corrupt memory or patch wrong address.
> > > > >     current impl. silently relies on value referenced by 'pointer' 
> > > > > argument
> > > > >     and to figure that out one has to read linker code on BIOS side.
> > > > >     That could be easily set wrong and slip through review.    
> > > > 
> > > > That's an API issue, it seemed like a good idea but I guess
> > > > it confuses people. Would you be happier using an offset
> > > > instead of a pointer?  
> > > offset is better and it would be better if it were saying
> > > which offset it is (i.e. relative to what)  
> > 
> > 
> > Start of table, right?
> not sure, to me it looks like start of a blob and not the table

Right that's what I meant.

> > 
> > > >   
> > > > >     API shouldn't rely on the caller setting value pointed by that 
> > > > > argument.    
> > > > 
> > > > I couldn't parse that one. Care suggesting a cleaner API for linker?  
> > > here is current API signature:
> > > 
> > > bios_linker_loader_add_pointer(GArray *linker,
> > >                                     const char *dest_file,
> > >                                     const char *src_file,
> > >                                     GArray *table, void *pointer,
> > >                                     uint8_t pointer_size)
> > > 
> > > issue 1: 
> > > where 'pointer' is a real pointer pointing inside 'table' and API
> > > calculates offset underhood:
> > >   offset = (gchar *)pointer - table->data;
> > > and puts it in ADD_POINTER command.
> > > 
> > > it's easy to get wrong offset if 'pointer' is not from 'table'.  
> > 
> > OK, replace that with table_offset?
> blob_offset?
> 
> also s/table/blob/


OK.

> > 
> > > issue 2:
> > > 'pointer' points to another offset of size 'pointer_size' in 'table'
> > > blob, that means that whoever composes blob, has to aware of
> > > it and fill correct value there which is possible to do right
> > > if one looks inside of SeaBIOS part of linker interface.
> > > Which is easy to forget and then one has to deal with mess
> > > caused by random memory corruption.
> > > 
> > > bios_linker_loader_add_pointer() and corresponding
> > > ADD_POINTER command should take this second offset as argument
> > > and do no require 'table' be pre-filled with it or
> > > in worst case if of extending ADD_POINTER command is problematic
> > > bios_linker_loader_add_pointer() should still take
> > > the second offset and patch 'table' itself so that 'table' composer
> > > don't have to worry about it.  
> > 
> > This one I don't understand. What's the second pointer you
> > are talking about?
> ha, see even the author already has absolutely no clue how linker works
> and about what offsets are relative to.
> see SeaBIOS romfile_loader_add_pointer():
>     ...
>     memcpy(&pointer, dest_file->data + offset, entry->pointer_size);
> here is the second offset       ^^^^^^^^^^^^^

It's the same offset in the entry.
        struct {
            char pointer_dest_file[ROMFILE_LOADER_FILESZ];
            char pointer_src_file[ROMFILE_LOADER_FILESZ];
            u32 pointer_offset;
            u8 pointer_size;
        };



> it should be properly named field of ADD_POINTER command and
> the not part of data blob.
> 
>     pointer = le64_to_cpu(pointer);
>     pointer += (unsigned long)src_file->data;
>     pointer = cpu_to_le64(pointer);
>     memcpy(dest_file->data + offset, &pointer, entry->pointer_size);
> 
> all this src|dst_file and confusing offsets(whatever they might mean)
> make me experience headache every time I need to remember how linker
> works and read both QEMU and SeaBIOS code to figure it out each time.
> That's what I'd call un-maintainable and hard to use API.

Tight, there's lack of documentation. It's my fault, so let's fix it.
It's an API issue, nothing to do with ABI.


> 
> > 
> > > issue 3:
> > > all patching obviously needs bounds checking on QEMU side
> > > so it would abort early if it could corrupt memory.  
> > 
> > That's easy.
> > 
> > > >   
> > > > >  2. If it's going to be used for patching AML, it should assert
> > > > >     when bios_linker_loader_add_pointer() is called if to be patched
> > > > >     AML object is wrong and patching would corrupt AML blob.    
> > > > 
> > > > Hmm for example check that the patched data has
> > > > the expected pattern?  
> > > yep, nothing could be done for raw tables but that should be possible
> > > for AML tables and if pattern is unsupported/size doesn't match
> > > it should abort QEMU early instead of corrupting table.  
> > 
> > Above all sounds reasonable. Would you like to take a stub
> > at it or prefer me to?
> It would be better if it were you.
> 
> I wouldn't like to maintain it ever as it's too complex and hard to use API,
> which I'd use only as the last resort if there weren't any other way
> to implement the task at hand.

Sorry, I don't get it. You don't like the API, write a better one
for an existing ABI. If you prefer waiting for me to fix it,
that's fine too but no guarantees that you will like the new one
or when it will happen.

Look there have been 1 change (a bigfix for alignment) in several years
since we added the linker. We don't maintain any compatiblity flags
around it *at all*. It might have a hard to use API but that is the
definition of easy to maintain.  You are pushing allocating memory host
side as an alternative, what happened there is the reverse. A ton of
changes and pain all the way, and we get to maintain a bag of compat
hacks for old machine types. You say we finally know what we are
doing and won't have to change it any more. I'm not convinced.

> > > >   
> > > > >     
> > > > > > It's there and we are not moving away from it, so why not
> > > > > > use it in more places?  Or if you think it's wrong, why don't you 
> > > > > > build
> > > > > > something better then?  We could then maybe use it for these things 
> > > > > > as
> > > > > > well.    
> > > > > 
> > > > > Yep, I think for vmgenid and even more so for nvdimm
> > > > > it would be better to allocate GPAs in QEMU and map backing
> > > > > MemoryRegions directly in QEMU.
> > > > > For nvdimm (main data region)
> > > > > we already do it using pc-dimm's GPA allocation algorithm, we also
> > > > > could use similar approach for nvdimm's label area and vmgenid.
> > > > > 
> > > > > Here is a simple attempt to add a limited GPA allocator in high memory
> > > > >  https://patchwork.ozlabs.org/patch/540852/
> > > > > But it haven't got any comment from you and were ignored.
> > > > > Lets consider it and perhaps we could come up with GPA allocator
> > > > > that could be used for other things as well.    
> > > > 
> > > > For nvdimm label area, I agree passing things through
> > > > a 4K buffer seems inefficient.
> > > > 
> > > > I'm not sure what's a better way though.
> > > > 
> > > > Use 64 bit memory? Setting aside old guests such as XP,
> > > > does it break 32 bit guests?  
> > > it might not work with 32bit guests, the same way as mem hotplug
> > > doesn't work for them unless they are PAE enabled.  
> > 
> > Right, I mean with PAE.
> I've tested it with 32-bit XP and Windows 10, they boot fine and
> vmgenid device is displayed as OK with buffer above 4Gb (on Win10).
> So at least is doesn't crash guest.
> I can't test more than that for 32 bit guests since utility
> to read vmgenid works only on Windows Server and there isn't
> a 32bit version of it.
> 
> > 
> > > but well that's a limitation of implementation and considering
> > > that storage nvdimm area is mapped at 64bit GPA it doesn't matter.
> > >    
> > > > I'm really afraid of adding yet another allocator, I think you
> > > > underestimate the maintainance headache: it's not theoretical and is
> > > > already felt.  
> > > Current maintenance headache is due to fixed handpicked
> > > mem layout, we can't do much with it for legacy machine
> > > types but with QEMU side GPA allocator we can try to switch
> > > to a flexible memory layout that would allocate GPA
> > > depending on QEMU config in a stable manner.  
> > 
> > So far, we didn't manage to. It seems to go in the reverse
> > direction were we add more and more control to let management
> > influence the layout. Things like alignment requirements
> > also tend to surface later and wreck havoc on whatever
> > we do.
> Was even there an attempt to try it before, could you point to it?

Look at the mess we have with the existing allocator.
As a way to fix this unmaintainable mess, what I see is suggestions
to drop old machine types so people have to reinstall guests.
This does not inspire confidence.

> The only attempt I've seen was https://patchwork.ozlabs.org/patch/540852/
> but it haven't got any technical comments from you,
> except of 'I'm afraid that it won't work' on IRC.
> 
> QEMU already has GPA allocator limited to memory hotplug AS
> and it has passed through 'growing' issues. What above patch
> proposes is to reuse already existing memory hotplug AS and
> maybe make its GPA allocator more generic (i.e. not tied
> only to pc-dimm) on top of it.

You say we finally know what we are doing and won't have to change it
any more. I'm not convinced.

> 
> It's sufficient for vmgenid use-case and a definitely
> much more suitable for nvdimm which already uses it for mapping
> main storage MemoryRegion.
> 
> > > well there is maintenance headache with bios_linker as well
> > > due to its complexity (multiple layers of indirection) and
> > > it will grow when more places try to use it.
> > > Yep we could use it as a hack, stealing RAM and trying implement
> > > backwards DMA or we could be less afraid and consider
> > > yet another allocator which will do the job without hacks
> > > which should benefit QEMU in a long run (it might be not easy
> > > to impl. it right but if we won't even try we would be buried
> > > in complex hacks that 'work' for now)
> > >   
> > > > > >     
> > > > > > >       
> > > > > > > >       
> > > > > > > > > > And hey, if you want to use a pci device to pass the 
> > > > > > > > > > physical
> > > > > > > > > > address guest to host, instead of reserving
> > > > > > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > > > > > automatically.        
> > > > > > > > > Could you elaborate more on this suggestion?        
> > > > > > > > 
> > > > > > > > I really just mean using PCI_Config operation region.
> > > > > > > > If you wish, I'll try to post a prototype next week.      
> > > > > > > I don't know much about PCI but it would be interesting,
> > > > > > > perhaps we could use it somewhere else.
> > > > > > > 
> > > > > > > However it should be checked if it works with Windows,
> > > > > > > for example PCI specific _DSM method is ignored by it
> > > > > > > if PCI device doesn't have working PCI driver bound to it.
> > > > > > >       
> > > > > > > >       
> > > > > > > > > > 
> > > > > > > > > >         
> > > > > > > > > > > ---
> > > > > > > > > > > changes since 17:
> > > > > > > > > > >   - small fixups suggested in v14 review by Michael S. 
> > > > > > > > > > > Tsirkin" <address@hidden>
> > > > > > > > > > >   - make BAR prefetchable to make region cached as per MS 
> > > > > > > > > > > spec
> > > > > > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > > > > > changes since 14:
> > > > > > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" 
> > > > > > > > > > > <address@hidden>
> > > > > > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > > > > > >     and place it at PCI0 scope, so that won't be need 
> > > > > > > > > > > trace its
> > > > > > > > > > >     location on PCI buses. - "Michael S. Tsirkin" 
> > > > > > > > > > > <address@hidden>
> > > > > > > > > > >   - permit only one vmgenid to be created
> > > > > > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped 
> > > > > > > > > > > at low mem
> > > > > > > > > > > ---
> > > > > > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > > > > > >  hw/misc/vmgenid.c                  | 154 
> > > > > > > > > > > +++++++++++++++++++++++++++++++++++++
> > > > > > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/default-configs/i386-softmmu.mak 
> > > > > > > > > > > b/default-configs/i386-softmmu.mak
> > > > > > > > > > > index b177e52..6402439 100644
> > > > > > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > diff --git a/default-configs/x86_64-softmmu.mak 
> > > > > > > > > > > b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > diff --git a/docs/specs/pci-ids.txt 
> > > > > > > > > > > b/docs/specs/pci-ids.txt
> > > > > > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > > > > > >  
> > > > > > > > > > >  All these devices are documented in docs/specs.
> > > > > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > > > > > index 78758e2..0187262 100644
> > > > > > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > > > > > @@ -44,6 +44,7 @@
> > > > > > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > >  
> > > > > > > > > > >  /* Supported chipsets: */
> > > > > > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > > > > > @@ -237,6 +238,40 @@ static void 
> > > > > > > > > > > acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > > > > > >  }
> > > > > > > > > > >  
> > > > > > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > > > > > +{
> > > > > > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > > > > > +
> > > > > > > > > > > +    dev = aml_device("VGEN");
> > > > > > > > > > > +    aml_append(dev, aml_name_decl("_HID", 
> > > > > > > > > > > aml_string("QEMU0003")));
> > > > > > > > > > > +    aml_append(dev, aml_name_decl("_CID", 
> > > > > > > > > > > aml_string("VM_Gen_Counter")));
> > > > > > > > > > > +    aml_append(dev, aml_name_decl("_DDN", 
> > > > > > > > > > > aml_string("VM_Gen_Counter")));
> > > > > > > > > > > +
> > > > > > > > > > > +    pkg = aml_package(2);
> > > > > > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > > > > > +
> > > > > > > > > > > +    /*
> > > > > > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM 
> > > > > > > > > > > and Windows
> > > > > > > > > > > +     * displays it as "PCI RAM controller" which is 
> > > > > > > > > > > marked as NO_DRV
> > > > > > > > > > > +     * so Windows ignores VMGEN device completely and 
> > > > > > > > > > > doesn't check
> > > > > > > > > > > +     * for resource conflicts which during PCI 
> > > > > > > > > > > rebalancing can lead
> > > > > > > > > > > +     * to another PCI device claiming ignored BARs. To 
> > > > > > > > > > > prevent this
> > > > > > > > > > > +     * statically reserve resources used by 
> > > > > > > > > > > VM_Gen_Counter.
> > > > > > > > > > > +     * For more verbose comment see this commit message. 
> > > > > > > > > > >          
> > > > > > > > > > 
> > > > > > > > > > What does "this commit message" mean?        
> > > > > > > > > above commit message. Should I reword it to just 'see commit 
> > > > > > > > > message'
> > > > > > > > >         
> > > > > > > > > >         
> > > > > > > > > > > +     */
> > > > > > > > > > > +     crs = aml_resource_template();
> > > > > > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, 
> > > > > > > > > > > AML_MIN_FIXED,
> > > > > > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, 
> > > > > > > > > > > AML_READ_WRITE, 0,
> > > > > > > > > > > +                buf_paddr, buf_paddr + 
> > > > > > > > > > > VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > > > > > +     return dev;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > >  /*
> > > > > > > > > > >   * Because of the PXB hosts we cannot simply query 
> > > > > > > > > > > TYPE_PCI_HOST_BRIDGE.
> > > > > > > > > > >   * On i386 arch we only have two pci hosts, so we can 
> > > > > > > > > > > look only for them.
> > > > > > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, 
> > > > > > > > > > > GArray *linker,
> > > > > > > > > > >              }
> > > > > > > > > > >  
> > > > > > > > > > >              if (bus) {
> > > > > > > > > > > +                Object *vmgen;
> > > > > > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > > > > > >                  /* Scan all PCI buses. Generate tables 
> > > > > > > > > > > to support hotplug. */
> > > > > > > > > > >                  build_append_pci_bus_devices(scope, bus, 
> > > > > > > > > > > pm->pcihp_bridge_en);
> > > > > > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, 
> > > > > > > > > > > GArray *linker,
> > > > > > > > > > >                      aml_append(scope, dev);
> > > > > > > > > > >                  }
> > > > > > > > > > >  
> > > > > > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > > > > > +                if (vmgen) {
> > > > > > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > > > > > +                    uint64_t buf_paddr =
> > > > > > > > > > > +                        pci_get_bar_addr(pdev, 
> > > > > > > > > > > VMGENID_VMGID_BUF_BAR);
> > > > > > > > > > > +
> > > > > > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > > > > > +                        aml_append(scope, 
> > > > > > > > > > > build_vmgenid_device(buf_paddr));
> > > > > > > > > > > +
> > > > > > > > > > > +                        method = 
> > > > > > > > > > > aml_method("\\_GPE._E00", 0,
> > > > > > > > > > > +                                            
> > > > > > > > > > > AML_NOTSERIALIZED);
> > > > > > > > > > > +                        aml_append(method,
> > > > > > > > > > > +                            
> > > > > > > > > > > aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > > > > > +                                       aml_int(0x80)));
> > > > > > > > > > > +                        aml_append(ssdt, method);
> > > > > > > > > > > +                    }
> > > > > > > > > > > +                }
> > > > > > > > > > > +
> > > > > > > > > > >                  aml_append(sb_scope, scope);
> > > > > > > > > > >              }
> > > > > > > > > > >          }
> > > > > > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, 
> > > > > > > > > > > GArray *linker,
> > > > > > > > > > >      {
> > > > > > > > > > >          aml_append(scope, aml_name_decl("_HID", 
> > > > > > > > > > > aml_string("ACPI0006")));
> > > > > > > > > > >  
> > > > > > > > > > > -        aml_append(scope, aml_method("_L00", 0, 
> > > > > > > > > > > AML_NOTSERIALIZED));
> > > > > > > > > > > -
> > > > > > > > > > >          if (misc->is_piix4) {
> > > > > > > > > > >              method = aml_method("_E01", 0, 
> > > > > > > > > > > AML_NOTSERIALIZED);
> > > > > > > > > > >              aml_append(method,
> > > > > > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > > > > > index d4765c2..1f05edd 100644
> > > > > > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += 
> > > > > > > > > > > stm32f2xx_syscfg.o
> > > > > > > > > > >  
> > > > > > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > > > > > new file mode 100644
> > > > > > > > > > > index 0000000..a2fbdfc
> > > > > > > > > > > --- /dev/null
> > > > > > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > > > > > @@ -0,0 +1,154 @@
> > > > > > > > > > > +/*
> > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > + *
> > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > + *
> > > > > > > > > > > + *  Authors: Gal Hammer <address@hidden>
> > > > > > > > > > > + *           Igor Mammedov <address@hidden>
> > > > > > > > > > > + *
> > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, 
> > > > > > > > > > > version 2 or later.
> > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > + *
> > > > > > > > > > > + */
> > > > > > > > > > > +
> > > > > > > > > > > +#include "hw/i386/pc.h"
> > > > > > > > > > > +#include "hw/pci/pci.h"
> > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > > > > > +#include "qapi/visitor.h"
> > > > > > > > > > > +
> > > > > > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), 
> > > > > > > > > > > VMGENID_DEVICE)
> > > > > > > > > > > +
> > > > > > > > > > > +typedef struct VmGenIdState {
> > > > > > > > > > > +    PCIDevice parent_obj;
> > > > > > > > > > > +    MemoryRegion iomem;
> > > > > > > > > > > +    union {
> > > > > > > > > > > +        uint8_t guid[16];
> > > > > > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > > > > > +    };
> > > > > > > > > > > +    bool guid_set;
> > > > > > > > > > > +} VmGenIdState;
> > > > > > > > > > > +
> > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > +    Object *obj = object_resolve_path_type("", 
> > > > > > > > > > > VMGENID_DEVICE, NULL);
> > > > > > > > > > > +    if (!obj) {
> > > > > > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > > > > > +    }
> > > > > > > > > > > +    return obj;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > > > > > +{
> > > > > > > > > > > +    Object *acpi_obj;
> > > > > > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > > > > > +
> > > > > > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > > > > > +    memory_region_set_dirty(&s->iomem, 0, 
> > > > > > > > > > > sizeof(s->guid));
> > > > > > > > > > > +
> > > > > > > > > > > +    acpi_obj = object_resolve_path_type("", 
> > > > > > > > > > > TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > > > > > +    if (acpi_obj) {
> > > > > > > > > > > +        AcpiDeviceIfClass *adevc = 
> > > > > > > > > > > ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > > > > > +
> > > > > > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler 
> > > > > > > > > > > */
> > > > > > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > > > > > +    }
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_set_guid(Object *obj, const char 
> > > > > > > > > > > *value, Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > +
> > > > > > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > > > > > +                   value);
> > > > > > > > > > > +        return;
> > > > > > > > > > > +    }
> > > > > > > > > > > +
> > > > > > > > > > > +    s->guid_set = true;
> > > > > > > > > > > +    vmgenid_update_guest(s);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor 
> > > > > > > > > > > *v, void *opaque,
> > > > > > > > > > > +                                   const char *name, 
> > > > > > > > > > > Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > > > > > +
> > > > > > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR 
> > > > > > > > > > > "': not initialized",
> > > > > > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > > > > > +        return;
> > > > > > > > > > > +    }
> > > > > > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > > > > > +{
> > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > +
> > > > > > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", 
> > > > > > > > > > > sizeof(s->guid_page),
> > > > > > > > > > > +                           &error_abort);
> > > > > > > > > > > +
> > > > > > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, 
> > > > > > > > > > > vmgenid_set_guid, NULL);
> > > > > > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, 
> > > > > > > > > > > "int",
> > > > > > > > > > > +                        vmgenid_get_vmgid_addr, NULL, 
> > > > > > > > > > > NULL, NULL, NULL);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > > > > > +    bool ambiguous = false;
> > > > > > > > > > > +
> > > > > > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, 
> > > > > > > > > > > &ambiguous);
> > > > > > > > > > > +    if (ambiguous) {
> > > > > > > > > > > +        error_setg(errp, "no more than one " 
> > > > > > > > > > > VMGENID_DEVICE
> > > > > > > > > > > +                         " device is permitted");
> > > > > > > > > > > +        return;
> > > > > > > > > > > +    }
> > > > > > > > > > > +
> > > > > > > > > > > +    if (!s->guid_set) {
> > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property 
> > > > > > > > > > > is not set",
> > > > > > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > > > > > +        return;
> > > > > > > > > > > +    }
> > > > > > > > > > > +
> > > > > > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > > > > > +    pci_register_bar(PCI_DEVICE(s), 
> > > > > > > > > > > VMGENID_VMGID_BUF_BAR,
> > > > > > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | 
> > > > > > > > > > > PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > > > > > +        &s->iomem);
> > > > > > > > > > > +    return;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void 
> > > > > > > > > > > *data)
> > > > > > > > > > > +{
> > > > > > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > > > > > +
> > > > > > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > > > > > +    dc->hotpluggable = false;
> > > > > > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_register_types(void)
> > > > > > > > > > > +{
> > > > > > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +type_init(vmgenid_register_types)
> > > > > > > > > > > diff --git a/include/hw/misc/vmgenid.h 
> > > > > > > > > > > b/include/hw/misc/vmgenid.h
> > > > > > > > > > > new file mode 100644
> > > > > > > > > > > index 0000000..b90882c
> > > > > > > > > > > --- /dev/null
> > > > > > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > > > > > @@ -0,0 +1,27 @@
> > > > > > > > > > > +/*
> > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > + *
> > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > + *
> > > > > > > > > > > + *  Authors: Gal Hammer <address@hidden>
> > > > > > > > > > > + *           Igor Mammedov <address@hidden>
> > > > > > > > > > > + *
> > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, 
> > > > > > > > > > > version 2 or later.
> > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > + *
> > > > > > > > > > > + */
> > > > > > > > > > > +
> > > > > > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > > > > > +
> > > > > > > > > > > +#include "qom/object.h"
> > > > > > > > > > > +
> > > > > > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > > > > > +
> > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > > > > > +
> > > > > > > > > > > +#endif
> > > > > > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > > > > > index dedf277..f4c9d48 100644
> > > > > > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > > > > > @@ -94,6 +94,7 @@
> > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > > > > > >  
> > > > > > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > > > > > -- 
> > > > > > > > > > > 1.8.3.1          
> > > > > >     
> > 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]