qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide


From: Alex Williamson
Subject: Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide
Date: Thu, 09 Jan 2014 12:03:26 -0700

On Thu, 2014-01-09 at 11:47 -0700, Alex Williamson wrote:
> On Thu, 2014-01-09 at 20:00 +0200, Michael S. Tsirkin wrote:
> > On Thu, Jan 09, 2014 at 10:24:47AM -0700, Alex Williamson wrote:
> > > On Wed, 2013-12-11 at 20:30 +0200, Michael S. Tsirkin wrote:
> > > > From: Paolo Bonzini <address@hidden>
> > > > 
> > > > As an alternative to commit 818f86b (exec: limit system memory
> > > > size, 2013-11-04) let's just make all address spaces 64-bit wide.
> > > > This eliminates problems with phys_page_find ignoring bits above
> > > > TARGET_PHYS_ADDR_SPACE_BITS and address_space_translate_internal
> > > > consequently messing up the computations.
> > > > 
> > > > In Luiz's reported crash, at startup gdb attempts to read from address
> > > > 0xffffffffffffffe6 to 0xffffffffffffffff inclusive.  The region it gets
> > > > is the newly introduced master abort region, which is as big as the PCI
> > > > address space (see pci_bus_init).  Due to a typo that's only 2^63-1,
> > > > not 2^64.  But we get it anyway because phys_page_find ignores the upper
> > > > bits of the physical address.  In address_space_translate_internal then
> > > > 
> > > >     diff = int128_sub(section->mr->size, int128_make64(addr));
> > > >     *plen = int128_get64(int128_min(diff, int128_make64(*plen)));
> > > > 
> > > > diff becomes negative, and int128_get64 booms.
> > > > 
> > > > The size of the PCI address space region should be fixed anyway.
> > > > 
> > > > Reported-by: Luiz Capitulino <address@hidden>
> > > > Signed-off-by: Paolo Bonzini <address@hidden>
> > > > Signed-off-by: Michael S. Tsirkin <address@hidden>
> > > > ---
> > > >  exec.c | 8 ++------
> > > >  1 file changed, 2 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/exec.c b/exec.c
> > > > index 7e5ce93..f907f5f 100644
> > > > --- a/exec.c
> > > > +++ b/exec.c
> > > > @@ -94,7 +94,7 @@ struct PhysPageEntry {
> > > >  #define PHYS_MAP_NODE_NIL (((uint32_t)~0) >> 6)
> > > >  
> > > >  /* Size of the L2 (and L3, etc) page tables.  */
> > > > -#define ADDR_SPACE_BITS TARGET_PHYS_ADDR_SPACE_BITS
> > > > +#define ADDR_SPACE_BITS 64
> > > >  
> > > >  #define P_L2_BITS 10
> > > >  #define P_L2_SIZE (1 << P_L2_BITS)
> > > > @@ -1861,11 +1861,7 @@ static void memory_map_init(void)
> > > >  {
> > > >      system_memory = g_malloc(sizeof(*system_memory));
> > > >  
> > > > -    assert(ADDR_SPACE_BITS <= 64);
> > > > -
> > > > -    memory_region_init(system_memory, NULL, "system",
> > > > -                       ADDR_SPACE_BITS == 64 ?
> > > > -                       UINT64_MAX : (0x1ULL << ADDR_SPACE_BITS));
> > > > +    memory_region_init(system_memory, NULL, "system", UINT64_MAX);
> > > >      address_space_init(&address_space_memory, system_memory, "memory");
> > > >  
> > > >      system_io = g_malloc(sizeof(*system_io));
> > > 
> > > This seems to have some unexpected consequences around sizing 64bit PCI
> > > BARs that I'm not sure how to handle.
> > 
> > BARs are often disabled during sizing. Maybe you
> > don't detect BAR being disabled?
> 
> See the trace below, the BARs are not disabled.  QEMU pci-core is doing
> the sizing an memory region updates for the BARs, vfio is just a
> pass-through here.

Sorry, not in the trace below, but yes the sizing seems to be happening
while I/O & memory are enabled int he command register.  Thanks,

Alex

> > >  After this patch I get vfio
> > > traces like this:
> > > 
> > > vfio: vfio_pci_read_config(0000:01:10.0, @0x10, len=0x4) febe0004
> > > (save lower 32bits of BAR)
> > > vfio: vfio_pci_write_config(0000:01:10.0, @0x10, 0xffffffff, len=0x4)
> > > (write mask to BAR)
> > > vfio: region_del febe0000 - febe3fff
> > > (memory region gets unmapped)
> > > vfio: vfio_pci_read_config(0000:01:10.0, @0x10, len=0x4) ffffc004
> > > (read size mask)
> > > vfio: vfio_pci_write_config(0000:01:10.0, @0x10, 0xfebe0004, len=0x4)
> > > (restore BAR)
> > > vfio: region_add febe0000 - febe3fff [0x7fcf3654d000]
> > > (memory region re-mapped)
> > > vfio: vfio_pci_read_config(0000:01:10.0, @0x14, len=0x4) 0
> > > (save upper 32bits of BAR)
> > > vfio: vfio_pci_write_config(0000:01:10.0, @0x14, 0xffffffff, len=0x4)
> > > (write mask to BAR)
> > > vfio: region_del febe0000 - febe3fff
> > > (memory region gets unmapped)
> > > vfio: region_add fffffffffebe0000 - fffffffffebe3fff [0x7fcf3654d000]
> > > (memory region gets re-mapped with new address)
> > > qemu-system-x86_64: vfio_dma_map(0x7fcf38861710, 0xfffffffffebe0000, 
> > > 0x4000, 0x7fcf3654d000) = -14 (Bad address)
> > > (iommu barfs because it can only handle 48bit physical addresses)
> > > 
> > 
> > Why are you trying to program BAR addresses for dma in the iommu?
> 
> Two reasons, first I can't tell the difference between RAM and MMIO.
> Second, it enables peer-to-peer DMA between devices, which is something
> that we might be able to take advantage of with GPU passthrough.
> 
> > > Prior to this change, there was no re-map with the fffffffffebe0000
> > > address, presumably because it was beyond the address space of the PCI
> > > window.  This address is clearly not in a PCI MMIO space, so why are we
> > > allowing it to be realized in the system address space at this location?
> > > Thanks,
> > > 
> > > Alex
> > 
> > Why do you think it is not in PCI MMIO space?
> > True, CPU can't access this address but other pci devices can.
> 
> What happens on real hardware when an address like this is programmed to
> a device?  The CPU doesn't have the physical bits to access it.  I have
> serious doubts that another PCI device would be able to access it
> either.  Maybe in some limited scenario where the devices are on the
> same conventional PCI bus.  In the typical case, PCI addresses are
> always limited by some kind of aperture, whether that's explicit in
> bridge windows or implicit in hardware design (and perhaps made explicit
> in ACPI).  Even if I wanted to filter these out as noise in vfio, how
> would I do it in a way that still allows real 64bit MMIO to be
> programmed.  PCI has this knowledge, I hope.  VFIO doesn't.  Thanks,
> 
> Alex






reply via email to

[Prev in Thread] Current Thread [Next in Thread]