qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu-2.8-rc4 is broken


From: Alex Bennée
Subject: Re: [Qemu-devel] qemu-2.8-rc4 is broken
Date: Fri, 20 Jan 2017 17:33:11 +0000
User-agent: mu4e 0.9.19; emacs 25.1.91.2

Pavel Dovgalyuk <address@hidden> writes:

>> From: Alex Bennée [mailto:address@hidden
>> >> From: Stefan Hajnoczi [mailto:address@hidden
>> >> >
>> >> > Yes, this option helps.
>> >> > Thank you.
>> >>
>> >> Good news.  This can be fixed in 2.8.1 once someone finds a solution.
>> >
>> > It seems that something still goes wrong.
>> > I'm using this workaround, but there is a kind of deadlock in translation.
>> > call_rcu_thread hangs at some moment in qemu_event_wait.
>> >
>> > As far as I understand, it is used by QHT in translate-all.c.
>> > I can't get more information yet, because logging makes everything too 
>> > slow.
>>
>> There are a number of users of RCU bit for QHT I think it only gets
>> activated when it needs to re-size its hash table on insertion of new
>> TranslationBlocks.
>>
>> Can you get a backtrace of all threads when it deadlocks?
>
> Sorry, this is another problem which occurs only in icount replay mode:
> 1. cpu_handle_exception tries to force exception when is cannot occur due to
>    running out all the planned instructions:
>     } else if (replay_has_exception()
>                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>         /* try to cause an exception pending in the log */
>         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>         *ret = -1;
>         return true;
>
> 2. tb_find calls tb_gen_code, which cannot allocate new translation block
>    and calls tb_flush (which only queues the flushing) and cpu_loop_exit
> 3. cpu_loop_exit returns to infinite loop of cpu_exec and the condition
>             if (cpu_handle_exception(cpu, &ret)) {
>                 break;
>             }
>    is checked again causing an infinite loop.
>
> TB cache is not flushed because we never execute that break and real work of 
> tb_flush
> is made outside this loop.

I think what we need is a:


  if (cpu->exit_request)
    break;

before the cpu_handle_exception() call to ensure any queued work gets
processed first. Can you give me you current command line so I can
reproduce this and check the fix works?

--
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]