[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device |
Date: |
Tue, 16 Feb 2016 16:53:25 +0200 |
On Tue, Feb 16, 2016 at 02:51:25PM +0100, Igor Mammedov wrote:
> On Tue, 16 Feb 2016 14:36:49 +0200
> Marcel Apfelbaum <address@hidden> wrote:
>
> > On 02/16/2016 02:17 PM, Igor Mammedov wrote:
> > > On Tue, 16 Feb 2016 12:05:33 +0200
> > > Marcel Apfelbaum <address@hidden> wrote:
> > >
> > >> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
> > >>> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> > >>>> On Tue, 9 Feb 2016 14:17:44 +0200
> > >>>> "Michael S. Tsirkin" <address@hidden> wrote:
> > >>>>
> > >>>>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > >>>>>>> So the linker interface solves this rather neatly:
> > >>>>>>> bios allocates memory, bios passes memory map to guest.
> > >>>>>>> Served us well for several years without need for extensions,
> > >>>>>>> and it does solve the VM GEN ID problem, even though
> > >>>>>>> 1. it was never designed for huge areas like nvdimm seems to want
> > >>>>>>> to use
> > >>>>>>> 2. we might want to add a new 64 bit flag to avoid touching low
> > >>>>>>> memory
> > >>>>>> linker interface is fine for some readonly data, like ACPI tables
> > >>>>>> especially fixed tables not so for AML ones is one wants to patch it.
> > >>>>>>
> > >>>>>> However now when you want to use it for other purposes you start
> > >>>>>> adding extensions and other guest->QEMU channels to communicate
> > >>>>>> patching info back.
> > >>>>>> It steals guest's memory which is also not nice and doesn't scale
> > >>>>>> well.
> > >>>>>
> > >>>>> This is an argument I don't get. memory is memory. call it guest
> > >>>>> memory
> > >>>>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > >>>>> but much slower.
> > >>>>>
> > >>>>> ...
> > >>>> It however matters for user, he pays for guest with XXX RAM but gets
> > >>>> less
> > >>>> than that. And that will be getting worse as a number of such devices
> > >>>> increases.
> > >>>>
> > >>>>>>> OK fine, but returning PCI BAR address to guest is wrong.
> > >>>>>>> How about reading it from ACPI then? Is it really
> > >>>>>>> broken unless there's *also* a driver?
> > >>>>>> I don't get question, MS Spec requires address (ADDR method),
> > >>>>>> and it's read by ACPI (AML).
> > >>>>>
> > >>>>> You were unhappy about DMA into guest memory.
> > >>>>> As a replacement for DMA, we could have AML read from
> > >>>>> e.g. PCI and write into RAM.
> > >>>>> This way we don't need to pass address to QEMU.
> > >>>> That sounds better as it saves us from allocation of IO port
> > >>>> and QEMU don't need to write into guest memory, the only question is
> > >>>> if PCI_Config opregion would work with driver-less PCI device.
> > >>>
> > >>> Or PCI BAR for that reason. I don't know for sure.
> > >>>
> > >>>>
> > >>>> And it's still pretty much not test-able since it would require
> > >>>> fully running OSPM to execute AML side.
> > >>>
> > >>> AML is not testable, but that's nothing new.
> > >>> You can test reading from PCI.
> > >>>
> > >>>>>
> > >>>>>> As for working PCI_Config OpRegion without driver, I haven't tried,
> > >>>>>> but I wouldn't be surprised if it doesn't, taking in account that
> > >>>>>> MS introduced _DSM doesn't.
> > >>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>>> Just compare with a graphics card design, where on device
> > >>>>>>>>>> memory
> > >>>>>>>>>> is mapped directly at some GPA not wasting RAM that guest
> > >>>>>>>>>> could
> > >>>>>>>>>> use for other tasks.
> > >>>>>>>>>
> > >>>>>>>>> This might have been true 20 years ago. Most modern cards do
> > >>>>>>>>> DMA.
> > >>>>>>>>
> > >>>>>>>> Modern cards, with it's own RAM, map its VRAM in address space
> > >>>>>>>> directly
> > >>>>>>>> and allow users use it (GEM API). So they do not waste
> > >>>>>>>> conventional RAM.
> > >>>>>>>> For example NVIDIA VRAM is mapped as PCI BARs the same way like in
> > >>>>>>>> this
> > >>>>>>>> series (even PCI class id is the same)
> > >>>>>>>
> > >>>>>>> Don't know enough about graphics really, I'm not sure how these are
> > >>>>>>> relevant. NICs and disks certainly do DMA. And virtio gl seems to
> > >>>>>>> mostly use guest RAM, not on card RAM.
> > >>>>>>>
> > >>>>>>>>>> VMGENID and NVDIMM use-cases look to me exactly the same,
> > >>>>>>>>>> i.e.
> > >>>>>>>>>> instead of consuming guest's RAM they should be mapped at
> > >>>>>>>>>> some GPA and their memory accessed directly.
> > >>>>>>>>>
> > >>>>>>>>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > >>>>>>>>> address. This breaks the straight-forward approach of using a
> > >>>>>>>>> rebalanceable PCI BAR.
> > >>>>>>>>
> > >>>>>>>> For PCI rebalance to work on Windows, one has to provide working
> > >>>>>>>> PCI driver
> > >>>>>>>> otherwise OS will ignore it when rebalancing happens and
> > >>>>>>>> might map something else over ignored BAR.
> > >>>>>>>
> > >>>>>>> Does it disable the BAR then? Or just move it elsewhere?
> > >>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
> > >>>>>> another device with driver over it.
> > >>>>>
> > >>>>> Interesting. On classical PCI this is a forbidden configuration.
> > >>>>> Maybe we do something that confuses windows?
> > >>>>> Could you tell me how to reproduce this behaviour?
> > >>>> #cat > t << EOF
> > >>>> pci_update_mappings_del
> > >>>> pci_update_mappings_add
> > >>>> EOF
> > >>>>
> > >>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> > >>>> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> > >>>> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> > >>>> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> > >>>>
> > >>>> wait till OS boots, note BARs programmed for ivshmem
> > >>>> in my case it was
> > >>>> 01:01.0 0,0xfe800000+0x100
> > >>>> then execute script and watch pci_update_mappings* trace events
> > >>>>
> > >>>> # for i in $(seq 3 18); do printf -- "device_add
> > >>>> e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> > >>>>
> > >>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> > >>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> > >>>> and then programs new BARs, where:
> > >>>> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0
> > >>>> 0,0xfe800000+0x20000
> > >>>> creates overlapping BAR with ivshmem
> > >>>
> > >>>
> > >>> Thanks!
> > >>> We need to figure this out because currently this does not
> > >>> work properly (or maybe it works, but merely by chance).
> > >>> Me and Marcel will play with this.
> > >>>
> > >>
> > >> I checked and indeed we have 2 separate problems:
> > >>
> > >> 1. ivshmem is declared as PCI RAM controller and Windows *does* have the
> > >> drivers
> > >> for it, however it is not remapped on re-balancing.
> > > Does it really have a driver, i.e ivshmem specific one?
> > > It should have its own driver or otherwise userspace
> > > won't be able to access/work with it and it would be pointless
> > > to add such device to machine.
> >
> > No, it does not.
> so it's "PCI RAM controller", which is marked as NODRV in INF file,
> NODRV they use as a stub to prevent Windows asking for driver assuming
> that HW owns/manages device.
> And when rebalancing happens Windows completely ignores NODRV
> BARs which causes overlapping with devices that have PCI drivers.
But that can't work for classic pci: if BARs overlap,
behaviour is undefined.
We do something that windows does not expect that makes
it create this setup.
Is this something enabling BARs in fimware? Something else?
> >
> > >
> > >> You can see on Device Manage 2 working devices with the same MMIO
> > >> region - strange!
> > >> This may be because PCI RAM controllers can't be re-mapped? Even
> > >> then, it should not be overridden.
> > >> Maybe we need to add a clue to the OS in ACPI regarding this range?
> > >>
> > >> 2. PCI devices with no driver installed are not re-mapped. This can be OK
> > >> from the Windows point of view because Resources Window does not
> > >> show the MMIO range
> > >> for this device.
> > >>
> > >> If the other (re-mapped) device is working, is pure luck. Both
> > >> Memory Regions occupy the same range
> > >> and have the same priority.
> > >>
> > >> We need to think about how to solve this.
> > >> One way would be to defer the BAR activation to the guest OS, but I am
> > >> not sure of the consequences.
> > > deferring won't solve problem as rebalancing could happen later
> > > and make BARs overlap.
> >
> > Why not? If we do not activate the BAR in firmware and Windows does not
> > have a driver
> > for it, will not activate it at all, right?
> > Why would Windows activate the device BAR if it can't use it? At least this
> > is what I hope.
> > Any other idea would be appreciated.
> >
> >
> > > I've noticed that at startup Windows unmaps and then maps BARs
> > > at the same addresses where BIOS've put them before.
> >
> > Including devices without a working driver?
> I've just tried, it does so for ivshmem.
>
> >
> >
> > Thanks,
> > Marcel
> >
> > >
> > >> And this does not solve the ivshmem problem.
> > > So far the only way to avoid overlapping BARs due to Windows
> > > doing rebalancing for driver-less devices is to pin such
> > > BARs statically with _CRS in ACPI table but as Michael said
> > > it fragments PCI address-space.
> > >
> > >>
> > >> Thanks,
> > >> Marcel
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, (continued)
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Igor Mammedov, 2016/02/15
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Michael S. Tsirkin, 2016/02/15
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Igor Mammedov, 2016/02/15
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Marcel Apfelbaum, 2016/02/16
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Igor Mammedov, 2016/02/16
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Marcel Apfelbaum, 2016/02/16
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Igor Mammedov, 2016/02/16
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device,
Michael S. Tsirkin <=
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Michael S. Tsirkin, 2016/02/16
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Michael S. Tsirkin, 2016/02/10
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Michael S. Tsirkin, 2016/02/10
- Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, Laszlo Ersek, 2016/02/10