qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3] vfio/common: Check iova with limit not with


From: Alex Williamson
Subject: Re: [Qemu-devel] [PATCH v3] vfio/common: Check iova with limit not with size
Date: Fri, 22 Jan 2016 15:19:16 -0700

On Fri, 2016-01-22 at 15:14 -0700, Alex Williamson wrote:
> On Thu, 2016-01-21 at 14:15 +0100, Pierre Morel wrote:
> > 
> > On 01/20/2016 04:46 PM, Alex Williamson wrote:
> > > On Wed, 2016-01-20 at 16:14 +0100, Pierre Morel wrote:
> > > > On 01/12/2016 07:16 PM, Alex Williamson wrote:
> > > > > On Tue, 2016-01-12 at 16:11 +0100, Pierre Morel wrote:
> > > > > > In vfio_listener_region_add(), we try to validate that the region
> > > > > > is
> > > > > > not
> > > > > > zero sized and hasn't overflowed the addresses space.
> > > > > > 
> > > > > > But the calculation uses the size of the region instead of
> > > > > > using the region's limit (size - 1).
> > > > > > 
> > > > > > This leads to Int128 overflow when the region has
> > > > > > been initialized to UINT64_MAX because in this case
> > > > > > memory_region_init() transform the size from UINT64_MAX
> > > > > > to int128_2_64().
> > > > > > 
> > > > > > Let's really use the limit by sustracting one to the size
> > > > > > and take care to use the limit for functions using limit
> > > > > > and size to call functions which need size.
> > > > > > 
> > > > > > Signed-off-by: Pierre Morel <address@hidden>
> > > > > > ---
> > > > > > 
> > > > > > Changes from v2:
> > > > > >       - all, just ignore v2, sorry about this,
> > > > > >         this is build after v1
> > > > > > 
> > > > > > Changes from v1:
> > > > > >       - adjust the tests by knowing we already substracted one to
> > > > > > end.
> > > > > > 
> > > > > >    hw/vfio/common.c |   14 +++++++-------
> > > > > >    1 files changed, 7 insertions(+), 7 deletions(-)
> > > > > > 
> > > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > > > index 6797208..a5f6643 100644
> > > > > > --- a/hw/vfio/common.c
> > > > > > +++ b/hw/vfio/common.c
> > > > > > @@ -348,12 +348,12 @@ static void
> > > > > > vfio_listener_region_add(MemoryListener *listener,
> > > > > >        if (int128_ge(int128_make64(iova), llend)) {
> > > > > >            return;
> > > > > >        }
> > > > > > -    end = int128_get64(llend);
> > > > > > +    end = int128_get64(int128_sub(llend, int128_one()));
> > > > > >    
> > > > > > -    if ((iova < container->min_iova) || ((end - 1) > container-
> > > > > > > max_iova)) {
> > > > > > +    if ((iova < container->min_iova) || (end  > container-
> > > > > > > max_iova)) {
> > > > > >            error_report("vfio: IOMMU container %p can't map guest
> > > > > > IOVA
> > > > > > region"
> > > > > >                         " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> > > > > > -                     container, iova, end - 1);
> > > > > > +                     container, iova, end);
> > > > > >            ret = -EFAULT;
> > > > > >            goto fail;
> > > > > >        }
> > > > > > @@ -363,7 +363,7 @@ static void
> > > > > > vfio_listener_region_add(MemoryListener *listener,
> > > > > >        if (memory_region_is_iommu(section->mr)) {
> > > > > >            VFIOGuestIOMMU *giommu;
> > > > > >    
> > > > > > -        trace_vfio_listener_region_add_iommu(iova, end - 1);
> > > > > > +        trace_vfio_listener_region_add_iommu(iova, end);
> > > > > >            /*
> > > > > >             * FIXME: We should do some checking to see if the
> > > > > >             * capabilities of the host VFIO IOMMU are adequate to
> > > > > > model
> > > > > > @@ -394,13 +394,13 @@ static void
> > > > > > vfio_listener_region_add(MemoryListener *listener,
> > > > > >                section->offset_within_region +
> > > > > >                (iova - section->offset_within_address_space);
> > > > > >    
> > > > > > -    trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
> > > > > > +    trace_vfio_listener_region_add_ram(iova, end, vaddr);
> > > > > >    
> > > > > > -    ret = vfio_dma_map(container, iova, end - iova, vaddr,
> > > > > > section-
> > > > > > > readonly);
> > > > > > +    ret = vfio_dma_map(container, iova, end - iova + 1, vaddr,
> > > > > > section->readonly);
> > > > > >        if (ret) {
> > > > > >            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > > > > >                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > > > > > -                     container, iova, end - iova, vaddr, ret);
> > > > > > +                     container, iova, end - iova + 1, vaddr,
> > > > > > ret);
> > > > > >            goto fail;
> > > > > >        }
> > > > > >    
> > > > > Hmm, did we just push the overflow from one place to another?  If
> > > > > we're
> > > > > mapping a full region of size int128_2_64() starting at iova zero,
> > > > > then
> > > > > this becomes (0xffff_ffff_ffff_ffff - 0 + 1) = 0.  So I think we
> > > > > need
> > > > > to calculate size with 128bit arithmetic too and let it assert if
> > > > > we
> > > > > overflow, ie:
> > > > > 
> > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > > index a5f6643..13ad90b 100644
> > > > > --- a/hw/vfio/common.c
> > > > > +++ b/hw/vfio/common.c
> > > > > @@ -321,7 +321,7 @@ static void
> > > > > vfio_listener_region_add(MemoryListener *listener,
> > > > >                                         MemoryRegionSection
> > > > > *section)
> > > > >    {
> > > > >        VFIOContainer *container = container_of(listener,
> > > > > VFIOContainer, listener);
> > > > > -    hwaddr iova, end;
> > > > > +    hwaddr iova, end, size;
> > > > >        Int128 llend;
> > > > >        void *vaddr;
> > > > >        int ret;
> > > > > @@ -348,7 +348,9 @@ static void
> > > > > vfio_listener_region_add(MemoryListener *listener,
> > > > >        if (int128_ge(int128_make64(iova), llend)) {
> > > > >            return;
> > > > >        }
> > > > > +
> > > > >        end = int128_get64(int128_sub(llend, int128_one()));
> > > > > +    size = int128_get64(int128_sub(llend, int128_make64(iova)));
> > > > here again, if iova is null, since llend is section->size (2^64) ...
> > > > 
> > > > >    
> > > > >        if ((iova < container->min_iova) || (end  > container-
> > > > > > max_iova)) {
> > > > >            error_report("vfio: IOMMU container %p can't map guest
> > > > > IOVA region"
> > > > > @@ -396,11 +398,11 @@ static void
> > > > > vfio_listener_region_add(MemoryListener *listener,
> > > > >    
> > > > >        trace_vfio_listener_region_add_ram(iova, end, vaddr);
> > > > >    
> > > > > -    ret = vfio_dma_map(container, iova, end - iova + 1, vaddr,
> > > > > section->readonly);
> > > > > +    ret = vfio_dma_map(container, iova, size, vaddr, section-
> > > > > > readonly);
> > > > >        if (ret) {
> > > > >            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > > > >                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > > > > -                     container, iova, end - iova + 1, vaddr, ret);
> > > > > +                     container, iova, size, vaddr, ret);
> > > > >            goto fail;
> > > > >        }
> > > > >    
> > > > > 
> > > > > Does that still solve your scenario?  Perhaps vfio-iommu-type1
> > > > > should
> > > > > have used first/last rather than start/size for mapping since we
> > > > > seem
> > > > > to have an off-by-one for mapping a full 64bit space.  Seems like
> > > > > we
> > > > > could do it with two calls to vfio_dma_map if we really wanted to.
> > > > > Thanks,
> > > > > 
> > > > > Alex
> > > > > 
> > > > You are right, every try to solve this will push the overflow
> > > > somewhere
> > > > else.
> > > > 
> > > > There is just no way to express 2^64 with 64 bits, we have the
> > > > int128()
> > > > solution,
> > > > but if we solve it here, we fall in the linux ioctl call anyway.
> > > > 
> > > > Intuitively, making two calls do not seem right to me.
> > > > 
> > > > But, what do you think of something like:
> > > > 
> > > > - creating a new VFIO extention
> > > > 
> > > > - and in ioctl(), since we have a flag entry in the
> > > > vfio_iommu_type1_dma_map,
> > > > may be adding a new flag meaning "map all virtual memory" ?
> > > > or meaning "use first/last" ?
> > > > I think this would break existing code unless we add a new VFIO
> > > > extension.
> > > Backup, is there ever a case where we actually need to map the entire
> > > 64bit address space?  This is fairly well impossible on x86.  I'm
> > > pointing out an issue, but I don't know that we need to solve it with
> > > more than an assert since it's never likely to happen.  Thanks,
> > > 
> > > Alex
> > > 
> > 
> > If I understood right, IOVA is the IO virtual address,
> > it is then possible to map the virtual address page 0xffff_ffff_ffff_f000
> > to something reasonable inside the real memory.
> 
> It is.
> 
> > Eventual we do not need to map the last virtual page but
> > I think that in a general case the all virtual memory, as viewed by the
> > device through the IOMMU should be mapped to avoid any uninitialized
> > virtual memory access.
> 
> When using vfio, a device only has access to the IOVA space which has
> been explicitly mapped.  This would be a security issue otherwise since
> kernel vfio can't rely on userspace to wipe the device IOVA space.
> 
> > It is the same reason that make us map the all virtual memory for the 
> > CPU MMU.
> 
> We don't really do that either, CPU mapping works based on page tables
> and non-existent entries simply don't exist.  We don't fully populate
> the page tables in advance, this would be a horrible waste of memory.
> 
> > May be I missed something, or may be I worry too much,
> > but I see this as a restriction on the supported hardware
> > if we compare host and guest hardware support compatibility.
> 
> I don't see the issue, there's arguably a bug in the API that doesn't
> allow us to map the full 64bit IOVA space of a device in a single
> mapping, but we can do it in two.  Besides, there's really no case
> where a device needs a fully populated IOTLB unless you're actually
> giving the device access to 16 EMB of memory.

s/EMB/EB/  Or I suppose technically EiB

> > We can live with it, because in fact you are right and today
> > I am not aware of a hardware wanting to access this page but a
> > hardware designers knowing having a IOMMU may want to access exactly
> > this kind of strange virtual page for special features and this would work
> > on the host but not inside of the guest.
> 
> The API issue is not that we can't map 0xffff_ffff_ffff_f000, it's that
> we can't map 0x0 through 0xffff_ffff_ffff_ffff in a single mapping
> because we pass the size instead of the end address (where size here
> would be 2^64).  We can map 0x0 through 0xffff_ffff_ffff_efff, followed
> by 0xffff_ffff_ffff_f000 through 0xffff_ffff_ffff_ffff, but again, why
> would you ever need to do this?  Thanks,
> 
> Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]