qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backe


From: Marcel Apfelbaum
Subject: Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram
Date: Thu, 1 Feb 2018 20:31:09 +0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 01/02/2018 20:21, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 15:53, Eduardo Habkost wrote:
>>> On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote:
>>>> On 01/02/2018 14:10, Eduardo Habkost wrote:
>>>>> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
>>>>>> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
>>>>> [...]
>>>>>>>> BTW, what's the root cause for requiring HVAs in the buffer?
>>>>>>>
>>>>>>> It's a side effect of the kernel/userspace API which always wants
>>>>>>> a single HVA/len pair to map memory for the application.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Hi Eduardo and Michael,
>>>>>>
>>>>>>>>  Can
>>>>>>>> this be fixed?
>>>>>>>
>>>>>>> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
>>>>>>> mapping an s/g list with actual memory. The HVA/len pair would then just
>>>>>>> be used to refer to the region, without creating the two mappings.
>>>>>>>
>>>>>>> Something like splitting the register mr into
>>>>>>>
>>>>>>> mr = create mr (va/len) - allocate a handle and record the va/len
>>>>>>>
>>>>>>> addmemory(mr, offset, hva, len) - pin memory
>>>>>>>
>>>>>>> register mr - pass it to HW
>>>>>>>
>>>>>>> As a nice side effect we won't burn so much virtual address space.
>>>>>>>
>>>>>>
>>>>>> We would still need a contiguous virtual address space range (for 
>>>>>> post-send)
>>>>>> which we don't have since guest contiguous virtual address space
>>>>>> will always end up as non-contiguous host virtual address space.
>>>>>>
>>>>>> I am not sure the RDMA HW can handle a large VA with holes.
>>>>>
>>>>> I'm confused.  Why would the hardware see and care about virtual
>>>>> addresses? 
>>>>
>>>> The post-send operations bypasses the kernel, and the process
>>>> puts in the work request GVA addresses.
>>>>
>>>>> How exactly does the hardware translates VAs to
>>>>> PAs? 
>>>>
>>>> The HW maintains a page-directory like structure different form MMU
>>>> VA -> phys pages
>>>>
>>>>> What if the process page tables change?
>>>>>
>>>>
>>>> Since the page tables the HW uses are their own, we just need the phys
>>>> page to be pinned.
>>>
>>> So there's no hardware-imposed requirement that the hardware VAs
>>> (mapped by the HW page directory) match the VAs in QEMU
>>> address-space, right? 
>>
>> Actually there is. Today it works exactly as you described.
> 
> Are you sure there's such hardware-imposed requirement?
> 

Yes.

> Why would the hardware require VAs to match the ones in the
> userspace address-space, if it doesn't use the CPU MMU at all?
> 

It works like that:

1. We register a buffer from the process address space
   giving its base address and length.
   This call goes to kernel which in turn pins the phys pages
   and registers them with the device *together* with the base
   address (virtual address!)
2. The device builds its own page tables to be able to translate
   the virtual addresses to actual phys pages.
3. The process executes post-send requests directly to hw by-passing
   the kernel giving process virtual addresses in work requests.
4. The device uses its own page tables to translate the virtual
   addresses to phys pages and sending them.

Theoretically is possible to send any contiguous IOVA instead of
process's one but is not how is working today.

Makes sense?

Thanks,
Marcel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]