qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backe


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram
Date: Thu, 1 Feb 2018 16:51:29 -0200
User-agent: Mutt/1.9.1 (2017-09-22)

On Thu, Feb 01, 2018 at 08:31:09PM +0200, Marcel Apfelbaum wrote:
> On 01/02/2018 20:21, Eduardo Habkost wrote:
> > On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote:
> >> On 01/02/2018 15:53, Eduardo Habkost wrote:
> >>> On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote:
> >>>> On 01/02/2018 14:10, Eduardo Habkost wrote:
> >>>>> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
> >>>>>> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
> >>>>>>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> >>>>> [...]
> >>>>>>>> BTW, what's the root cause for requiring HVAs in the buffer?
> >>>>>>>
> >>>>>>> It's a side effect of the kernel/userspace API which always wants
> >>>>>>> a single HVA/len pair to map memory for the application.
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> Hi Eduardo and Michael,
> >>>>>>
> >>>>>>>>  Can
> >>>>>>>> this be fixed?
> >>>>>>>
> >>>>>>> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
> >>>>>>> mapping an s/g list with actual memory. The HVA/len pair would then 
> >>>>>>> just
> >>>>>>> be used to refer to the region, without creating the two mappings.
> >>>>>>>
> >>>>>>> Something like splitting the register mr into
> >>>>>>>
> >>>>>>> mr = create mr (va/len) - allocate a handle and record the va/len
> >>>>>>>
> >>>>>>> addmemory(mr, offset, hva, len) - pin memory
> >>>>>>>
> >>>>>>> register mr - pass it to HW
> >>>>>>>
> >>>>>>> As a nice side effect we won't burn so much virtual address space.
> >>>>>>>
> >>>>>>
> >>>>>> We would still need a contiguous virtual address space range (for 
> >>>>>> post-send)
> >>>>>> which we don't have since guest contiguous virtual address space
> >>>>>> will always end up as non-contiguous host virtual address space.
> >>>>>>
> >>>>>> I am not sure the RDMA HW can handle a large VA with holes.
> >>>>>
> >>>>> I'm confused.  Why would the hardware see and care about virtual
> >>>>> addresses? 
> >>>>
> >>>> The post-send operations bypasses the kernel, and the process
> >>>> puts in the work request GVA addresses.
> >>>>
> >>>>> How exactly does the hardware translates VAs to
> >>>>> PAs? 
> >>>>
> >>>> The HW maintains a page-directory like structure different form MMU
> >>>> VA -> phys pages
> >>>>
> >>>>> What if the process page tables change?
> >>>>>
> >>>>
> >>>> Since the page tables the HW uses are their own, we just need the phys
> >>>> page to be pinned.
> >>>
> >>> So there's no hardware-imposed requirement that the hardware VAs
> >>> (mapped by the HW page directory) match the VAs in QEMU
> >>> address-space, right? 
> >>
> >> Actually there is. Today it works exactly as you described.
> > 
> > Are you sure there's such hardware-imposed requirement?
> > 
> 
> Yes.
> 
> > Why would the hardware require VAs to match the ones in the
> > userspace address-space, if it doesn't use the CPU MMU at all?
> > 
> 
> It works like that:
> 
> 1. We register a buffer from the process address space
>    giving its base address and length.
>    This call goes to kernel which in turn pins the phys pages
>    and registers them with the device *together* with the base
>    address (virtual address!)
> 2. The device builds its own page tables to be able to translate
>    the virtual addresses to actual phys pages.

How would the device be able to do that?  It would require the
device to look at the process page tables, wouldn't it?  Isn't
the HW IOVA->PA translation table built by the OS?


> 3. The process executes post-send requests directly to hw by-passing
>    the kernel giving process virtual addresses in work requests.
> 4. The device uses its own page tables to translate the virtual
>    addresses to phys pages and sending them.
> 
> Theoretically is possible to send any contiguous IOVA instead of
> process's one but is not how is working today.
> 
> Makes sense?

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]