qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64


From: Ard Biesheuvel
Subject: Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
Date: Wed, 31 Jan 2018 18:00:10 +0000

On 31 January 2018 at 17:39, Christoffer Dall
<address@hidden> wrote:
> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
> <address@hidden> wrote:
>> On 31 January 2018 at 16:53, Christoffer Dall
>> <address@hidden> wrote:
>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
>>> <address@hidden> wrote:
>>>> On 31 January 2018 at 09:53, Christoffer Dall
>>>> <address@hidden> wrote:
>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +0000, Marc Zyngier wrote:
>>>>>> On 29/01/18 10:04, Peter Maydell wrote:
>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert <address@hidden> 
>>>>>> > wrote:
>>>>>> >> * Peter Maydell (address@hidden) wrote:
>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert <address@hidden> 
>>>>>> >>> wrote:
>>>>>> >>>> * Peter Maydell (address@hidden) wrote:
>>>>>> >>>>> I think the correct fix here is that your test code should turn
>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work
>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video 
>>>>>> >>>>> memory
>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as
>>>>>> >>>>> Normal Cacheable, and then everything should work fine.
>>>>>> >>>>
>>>>>> >>>> Does this cause problems with migrating at just the wrong point 
>>>>>> >>>> during
>>>>>> >>>> a VM boot?
>>>>>> >>>
>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever
>>>>>> >>> tried to provoke that problem...
>>>>>> >>
>>>>>> >> If you think it'll get the RAM contents wrong, it might be best to 
>>>>>> >> fail
>>>>>> >> the migration if you can detect the cache is disabled in the guest.
>>>>>> >
>>>>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit
>>>>>> > in the guest's system registers, and refuse migration if it's off...
>>>>>> >
>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end
>>>>>> > of the stick about how thin the ice is in the period before the
>>>>>> > guest turns on its MMU...)
>>>>>>
>>>>>> Once MMU and caches are on, we should be in a reasonable place for QEMU
>>>>>> to have a consistent view of the memory. The trick is to prevent the
>>>>>> vcpus from changing that. A guest could perfectly turn off its MMU at
>>>>>> any given time if it needs to (and it is actually required on some HW if
>>>>>> you want to mitigate headlining CVEs), and KVM won't know about that.
>>>>>>
>>>>>
>>>>> (Clarification: KVM can detect this is it bother to check the VCPU's
>>>>> system registers, but we don't trap to KVM when the VCPU turns off its
>>>>> caches, right?)
>>>>>
>>>>>> You may have to pause the vcpus before starting the migration, or
>>>>>> introduce a new KVM feature that would automatically pause a vcpu that
>>>>>> is trying to disable its MMU while the migration is on. This would
>>>>>> involve trapping all the virtual memory related system registers, with
>>>>>> an obvious cost. But that cost would be limited to the time it takes to
>>>>>> migrate the memory, so maybe that's acceptable.
>>>>>>
>>>>> Is that even sufficient?
>>>>>
>>>>> What if the following happened. (1) guest turns off MMU, (2) guest
>>>>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
>>>>> guest ram.  QEMU's view of guest ram is now incorrect (stale,
>>>>> incoherent, ...).
>>>>>
>>>>> I'm also not really sure if pausing one VCPU because it turned off its
>>>>> MMU will go very well when trying to migrate a large VM (wouldn't this
>>>>> ask for all the other VCPUs beginning to complain that the stopped VCPU
>>>>> appears to be dead?).  As a short-term 'fix' it's probably better to
>>>>> refuse migration if you detect that a VCPU had begun turning off its
>>>>> MMU.
>>>>>
>>>>> On the larger scale of thins; this appears to me to be another case of
>>>>> us really needing some way to coherently access memory between QEMU and
>>>>> the VM, but in the case of the VCPU turning off the MMU prior to
>>>>> migration, we don't even know where it may have written data, and I'm
>>>>> therefore not really sure what the 'proper' solution would be.
>>>>>
>>>>> (cc'ing Ard who has has thought about this problem before in the context
>>>>> of UEFI and VGA.)
>>>>>
>>>>
>>>> Actually, the VGA case is much simpler because the host is not
>>>> expected to write to the framebuffer, only read from it, and the guest
>>>> is not expected to create a cacheable mapping for it, so any
>>>> incoherency can be trivially solved by cache invalidation on the host
>>>> side. (Note that this has nothing to do with DMA coherency, but only
>>>> with PCI MMIO BARs that are backed by DRAM in the host)
>>>
>>> In case of the running guest, the host will also only read from the
>>> cached mapping.  Of course, at restore, the host will also write
>>> through a cached mapping, but shouldn't the latter case be solvable by
>>> having KVM clean the cache lines when faulting in any page?
>>>
>>
>> We are still talking about the contents of the framebuffer, right? In
>> that case, yes, afaict
>>
>
> I was talking about normal RAM actually... not sure if that changes anything?
>

The main difference is that with a framebuffer BAR, it is pointless
for the guest to map it cacheable, given that the purpose of a
framebuffer is its side effects, which are not guaranteed to occur
timely if the mapping is cacheable.

If we are talking about normal RAM, then why are we discussing it here
and not down there?

vvv


>>>>
>>>> In the migration case, it is much more complicated, and I think
>>>> capturing the state of the VM in a way that takes incoherency between
>>>> caches and main memory into account is simply infeasible (i.e., the
>>>> act of recording the state of guest RAM via a cached mapping may evict
>>>> clean cachelines that are out of sync, and so it is impossible to
>>>> record both the cached *and* the delta with the uncached state)
>>>
>>> This may be an incredibly stupid question (and I may have asked it
>>> before), but why can't we clean+invalidate the guest page before
>>> reading it and thereby obtain a coherent view of a page?
>>>
>>
>> Because cleaning from the host will clobber whatever the guest wrote
>> directly to memory with the MMU off, if there is a dirty cacheline
>> shadowing that memory.
>
> If the host never wrote anything to that memory (it shouldn't mess
> with the guest's memory) there will only be clean cache lines (even if
> they contain content shadowing the memory) and cleaning them would be
> equivalent to an invalidate.  Am I misremembering how this works?
>

Cleaning doesn't actually invalidate, but it should be a no-op for
clean cachelines.

>> However, that same cacheline could be dirty
>> because the guest itself wrote to memory with the MMU on.
>
> Yes, but the guest would have no control over when such a cache line
> gets flushed to main memory by the hardware, and can have no
> reasonable expectation that the cache lines don't get cleaned behind
> its back.  The fact that a migration triggers this, is reasonable.  A
> guest that wants hand-off from main memory that its accessing with the
> MMU off, must invalidate the appropriate cache lines or ensure they're
> clean.  There's very likely some subtle aspect to all of this that I'm
> forgetting.
>

OK, so if the only way cachelines covering guest memory could be dirty
is after the guest wrote to that memory itself via a cacheable
mapping, I guess it would be reasonable to do clean+invalidate before
reading the memory. Then, the only way for the guest to lose anything
is in cases where it could not reasonably expect it to be retained
anyway.

However, that does leave a window, between the invalidate and the
read, where the guest could modify memory without it being visible to
the host.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]