qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu-2.8-rc4 is broken


From: Alex Bennée
Subject: Re: [Qemu-devel] qemu-2.8-rc4 is broken
Date: Tue, 20 Dec 2016 16:02:54 +0000

So having a quick review on #qemu with Stefan I think this is an
existing race condition that may have been made more likely with
recent changes. AFAICT what is happening is mouse movement in the GUI
is translated into PS2 events which are injected as interrupts into
the system by GUI thread. For whatever weird x86 reason the IRQ's are
triggered by writing to a ROM area of the system. As a result
invalidation of translated ROM code is triggered that clashes with a
tb_flush which is occurring at the same time.

I suspect you could make the race more likely to hit by starting the
system with --tb-size 1 which will make the tb_flush()'s more frequent
at the same time as wiggling the mouse a lot. If you could test this
on 2.7 I suspect you'll find it's just as crash-able.

Once the rest of MTTCG gets merged the tb_lock() will become active
for the SoftMMU which will prevent the race (as one vCPU will be able
to trigger invalidations in another vCPUs code).

As far as the problem of injecting events into the vCPU is concerned I
think the correct solution would be queue the work up using the
async_run_on_cpu functions which in the current case would ensure only
one thing was happening at a time. Certainly the design of QEMU isn't
expecting non-vCPU threads to invalidate TBs. The gdbstub does this as
well but currently gets away with it as generally the vCPUs are in a
halted state when changes are made.

I'm currently on holiday but I'll keep an eye out for this thread if
anyone has any follow up questions. It would be nice if one of the x86
guys could explain exactly what this weird ROM/IRQ setup is and why it
exists.

On 20 December 2016 at 14:02, Stefan Hajnoczi <address@hidden> wrote:
> On Tue, Dec 20, 2016 at 11:10 AM, Pavel Dovgalyuk <address@hidden> wrote:
>>> From: Stefan Hajnoczi [mailto:address@hidden
>>> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
>>> > It also fails much earlier when I enable logs with "-d int -D log".
>>> >
>>> > Here is backtrace for this failure:
>>> >
>>> >
>>> >
>>> > #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
>>> >
>>> >    from /c/Windows/SYSTEM32/ntdll.dll
>>> >
>>> > #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
>>> >
>>> >    from /c/Windows/SYSTEM32/ntdll.dll
>>> >
>>> > #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
>>> >
>>> >    from /c/Windows/SYSTEM32/ntdll.dll
>>> >
>>> > #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
>>> >
>>> >    from /c/Windows/SYSTEM32/ntdll.dll
>>> >
>>> > #4  0x000007fefdc810c8 in msvcrt!free () from 
>>> > /c/Windows/system32/msvcrt.dll
>>>
>>> Looks like a heap corruption bug since free() is failing.
>>
>> Seems to be a race condition.
>> When I add logs into invalidate_page_bitmap, the bug disappears.
>> It seems that someone tries to free the same page bitmap twice and 
>> simultaneously.
>
> Does the following workaround prevent the crashes?
>
> -global apic-common.vapic=off
>
>> Here is the stack trace for two threads at the moment when qemu fails:
>> Failed thread:
>> #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
>>    from /c/Windows/SYSTEM32/ntdll.dll
>> #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
>>    from /c/Windows/SYSTEM32/ntdll.dll
>> #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
>>    from /c/Windows/SYSTEM32/ntdll.dll
>> #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
>>    from /c/Windows/SYSTEM32/ntdll.dll
>> #4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
>> #5  0x000000000040c57d in invalidate_page_bitmap (p=<optimized out>,
>>     p=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:881
>> #6  tb_invalidate_phys_page_range (start=826113, address@hidden,
>>     address@hidden)
>>     at D:/Projects/QEMU/qemu/translate-all.c:1527
>> #7  0x000000000040c5ed in tb_invalidate_phys_range_1 (end=826116,
>>     start=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:1414
>> #8  tb_invalidate_phys_range (address@hidden, address@hidden)
>>     at D:/Projects/QEMU/qemu/translate-all.c:1424
>> #9  0x0000000000402e5f in invalidate_and_set_dirty (address@hidden,
>>     addr=<optimized out>, length=<optimized out>)
>>     at D:/Projects/QEMU/qemu/exec.c:2511
>> #10 0x0000000000406af7 in cpu_physical_memory_write_rom_internal (
>>     type=WRITE_DATA, len=3, buf=0x22f411 "", addr=826113,
>>     as=0xab4280 <address_space_memory>) at D:/Projects/QEMU/qemu/exec.c:2795
>> <and more>
>>
>> Another thread:
>> #0  0x00000000007411b0 in g_free ()
>> #1  0x000000000040b6b4 in invalidate_page_bitmap (p=0x1143f738, p=0x1143f738)
>>     at D:/Projects/QEMU/qemu/translate-all.c:881
>> #2  page_flush_tb_1 (address@hidden, lp=0x5715058)
>>     at D:/Projects/QEMU/qemu/translate-all.c:900
>> #3  0x000000000040b6ee in page_flush_tb_1 (level=1, lp=0xac8ac0 <l1_map>)
>>     at D:/Projects/QEMU/qemu/translate-all.c:906
>> #4  0x000000000040b7b3 in page_flush_tb ()
>>     at D:/Projects/QEMU/qemu/translate-all.c:916
>> #5  do_tb_flush (cpu=<optimized out>, tb_flush_count=...)
>>     at D:/Projects/QEMU/qemu/translate-all.c:954
>> #6  0x0000000000519ac1 in process_queued_cpu_work (cpu=0x5632fd0)
>>     at cpus-common.c:338
>> #7  0x0000000000439761 in qemu_wait_io_event_common (cpu=0x5632fd0)
>>     at D:/Projects/QEMU/qemu/cpus.c:942
>> #8  qemu_tcg_wait_io_event (cpu=<optimized out>)
>>     at D:/Projects/QEMU/qemu/cpus.c:957
>> #9  qemu_tcg_cpu_thread_fn (address@hidden)
>>     at D:/Projects/QEMU/qemu/cpus.c:1216
>> #10 0x000000000072c285 in win32_start_routine (arg=0x565ba40)
>>     at util/qemu-thread-win32.c:406
>> #11 0x000007fefdc8415f in srand () from /c/Windows/system32/msvcrt.dll
>> #12 0x000007fefdc86ebd in msvcrt!_ftime64_s ()
>> <and more>
>>
>>>
>>> QEMU 2.8.0 is scheduled for release today.  I have checked that
>>> qemu-system-i386.exe works but without playing an MP3 file in Windows
>>> XP.
>>>
>>> I plan to go ahead with the release unless information becomes available
>>> that suggests it affects more than just this one scenario.
>>
>>> >
>>> >
>>> > I encountered the following bug with the latest version of QEMU.
>>> >
>>> > I use windows host and start qemu with the following command line:
>>> >
>>> > qemu-system-i386.exe -soundhw ac97 -snapshot -hda disk.qcow2 -net none
>>> >
>>> >
>>> >
>>> > Guest system is Windows XP 32-bit. It founds new hardware (including 
>>> > audio controller)
>>> >
>>> > and I start playing mp3 file.
>>> >
>>> > After seconds of playing qemu fails with an exception.
>>> >
>>> >
>>> >
>>> > I tried to bisect between 2.7 and 2.8, but bug is not stable.
>>> >
>>> > It manifested itself at commits 
>>> > "68701de1362b29fd6941a2021e9393ddbe60edd8" and
>>> > "6a928d25b6d8bc3729c3d28326c6db13b9481059".
>>
>>
>> Pavel Dovgalyuk
>>
>



-- 
Alex Bennée
KVM/QEMU Hacker for Linaro



reply via email to

[Prev in Thread] Current Thread [Next in Thread]