qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_D


From: Andrea Arcangeli
Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT)
Date: Wed, 21 May 2008 22:13:35 +0200

On Wed, May 21, 2008 at 01:25:59PM -0500, Anthony Liguori wrote:
> I think we're talking about different things.  What I'm talking about is 
> the following:
>
> Guest issues DMA read from disk at offset N of size M to physical address 
> X.   Today, we essentially read from the backing disk image from offset N 
> into a temporary buffer of size M, and then memcpy() to physical address X.
>
> What I would like to do, if N and M are multiples of PAGE_SIZE, is replace 
> the memory at guest physical address X, with the host's page cache for N, 
> M.  The guest is unaware of this though and it may decide to reclaim that 
> memory for something else.  When this happens, we need to unmap guest 
> physical address X and replace it with normal memory (essentially, 
> CoW'ing).
>
> The effect of this would be that if multiple guests are using the same disk 
> image, they would end up sharing memory transparently.
>
> With MMU notifiers, this is possible by just using mmap(MAP_PRIVATE | 
> MAP_FIXED) assuming we fix gfn_to_pfn() to take a 'write' parameter, right 
> now we always write fault CoW mappings because we unconditionally call 
> get_user_pages with write=1.

Ok, now I exactly see what you're going after. So it'd save memory
yes, but only with -snapshot... And it'd be zerocopy yes, but it'd
need to flush the tlb of all cpus (both regular pte and spte too) with
ipis for every pte overwritten as the old pte could be cached in the
tlb even if this won't require further writes to the cache. ipis are
likely more costly than a local memcpy of 4k region. One thing is
calling get_user_pages in O_DIRECT to only know which is the physical
page the DMA should be directed to (in our case the anonymous page
pointed by the gpa), one thing is to mangle ptes and having to update
the tlbs for each emulated dma operation etc...

> As has been pointed out, this is probably not ideal since it would cause 
> heavy vma fragmentation.  We may be able to simulate this using the slots 
> API although slots are quite similar to vma's in that we optimize for a 
> small number of them.

I'm quite sure remap_file_pages can be extended to work on
MAP_PRIVATE. But I don't see the big benefit in sharing the ram
between host and guest, when having it in the guest is enough and this
only works for read anyway and it can only share ram among different
guests with -snapshot.

So while it sounds a clever trick, I doubt it's a worthwhile
optimization, it has downsides, and the worst is that I don't see how
we could extend this logic to work for writes because the pagecache of
the guest can't be written on disk before the dma is explicitly
started on the guest.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]