qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] exec.c:invalidate_and_set_dirty() only checks whether f


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] exec.c:invalidate_and_set_dirty() only checks whether first page in its range is dirty...
Date: Tue, 18 Nov 2014 14:53:18 +0000
User-agent: Mutt/1.5.23 (2014-03-12)

On Sun, Nov 16, 2014 at 06:11:48PM +0000, Peter Maydell wrote:
> I'm trying to track down a bug in ARM TCG where we:
>  * boot a guest
>  * run 'shutdown -r now' to trigger a reboot
>  * on reboot, crash when running userspace because the contents
>    of physical RAM have changed but the translated code from
>    before the shutdown was never invalidated
> 
> This is with a virtio-mmio block device as the disk.
> 
> Debugging indicates that when the post-reboot guest reloads
> binaries from disk into ram we fail to invalidate the cached
> translations. For the specific case I looked at, we have a
> translation of code at ramaddr_t 0x806e000. The disk load
> pulls 0x16000 bytes of data off disk to address 0x806a000.
> Virtio correctly calls address_space_unmap(), which is supposed
> to be what marks the ram range as dirty. It in turn calls
> invalidate_and_set_clean(). However invalidate_and_set_clean()
> just does this:
> 
>     if (cpu_physical_memory_is_clean(addr)) {
>         /* invalidate code */
>         tb_invalidate_phys_page_range(addr, addr + length, 0);
>         /* set dirty bit */
>         cpu_physical_memory_set_dirty_range_nocode(addr, length);
>     }
> 
> So if the first page in the range (here 0x806a000) happens
> to be dirty then we won't do anything, even if later pages
> in the range do need to be invalidated. Also, we'll call
> tb_invalidate_phys_page_range() with a start/end which may
> be in different physical pages, which is forbidden by that
> function's API.
> 
> I guess invalidate_and_set_clean() really needs to be
> fixed to loop through each page in the range; does anybody
> know how this is supposed to work (or why nobody's noticed
> this bug before :-)) ?

Not directly but I don't like this code because it's not atomic.  I'll
send patches soon for atomic test-and-set and test-and-clear.  Hopefully
it won't impact performance too much.

What you've discovered seems like a plain old bug.  It needs a loop.

Stefan

Attachment: pgpWdIcKh01bK.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]