qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backe


From: Marcel Apfelbaum
Subject: Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram
Date: Thu, 1 Feb 2018 14:29:25 +0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 01/02/2018 14:10, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
>> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
>>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> [...]
>>>> BTW, what's the root cause for requiring HVAs in the buffer?
>>>
>>> It's a side effect of the kernel/userspace API which always wants
>>> a single HVA/len pair to map memory for the application.
>>>
>>>
>>
>> Hi Eduardo and Michael,
>>
>>>>  Can
>>>> this be fixed?
>>>
>>> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
>>> mapping an s/g list with actual memory. The HVA/len pair would then just
>>> be used to refer to the region, without creating the two mappings.
>>>
>>> Something like splitting the register mr into
>>>
>>> mr = create mr (va/len) - allocate a handle and record the va/len
>>>
>>> addmemory(mr, offset, hva, len) - pin memory
>>>
>>> register mr - pass it to HW
>>>
>>> As a nice side effect we won't burn so much virtual address space.
>>>
>>
>> We would still need a contiguous virtual address space range (for post-send)
>> which we don't have since guest contiguous virtual address space
>> will always end up as non-contiguous host virtual address space.
>>
>> I am not sure the RDMA HW can handle a large VA with holes.
> 
> I'm confused.  Why would the hardware see and care about virtual
> addresses? 

The post-send operations bypasses the kernel, and the process
puts in the work request GVA addresses.

> How exactly does the hardware translates VAs to
> PAs? 

The HW maintains a page-directory like structure different form MMU
VA -> phys pages

> What if the process page tables change?
> 

Since the page tables the HW uses are their own, we just need the phys
page to be pinned.

>>
>> An alternative would be 0-based MR, QEMU intercepts the post-send
>> operations and can substract the guest VA base address.
>> However I didn't see the implementation in kernel for 0 based MRs
>> and also the RDMA maintainer said it would work for local keys
>> and not for remote keys.
> 
> This is also unexpected: are GVAs visible to the virtual RDMA
> hardware? 

Yes, explained above

> Where does the QEMU pvrdma code translates GVAs to
> GPAs?
> 

During reg_mr (memory registration commands)
Then it registers the same addresses to the real HW.
(as Host virtual addresses)

Thanks,
Marcel

>>
>>> This will fix rdma with hugetlbfs as well which is currently broken.
>>>
>>>
>>
>> There is already a discussion on the linux-rdma list:
>>     https://www.spinics.net/lists/linux-rdma/msg60079.html
>> But it will take some (actually a lot of) time, we are currently talking 
>> about
>> a possible API. And it does not solve the re-mapping...
>>
>> Thanks,
>> Marcel
>>
>>>> -- 
>>>> Eduardo
>>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]