qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC DEBUG PATCH 3/3] translate-a64: fix lookup_tb_ptr


From: Alex Bennée
Subject: Re: [Qemu-devel] [RFC DEBUG PATCH 3/3] translate-a64: fix lookup_tb_ptr hang (DEBUG!)
Date: Sat, 10 Jun 2017 09:51:26 +0100
User-agent: mu4e 0.9.19; emacs 25.2.50.3

Richard Henderson <address@hidden> writes:

> On 06/09/2017 10:01 AM, Alex Bennée wrote:
>> THIS IS A DEBUG PATCH DO NOT MERGE
>>
>> I include all the comments to show my working. I was trying to
>> isolate which instructions cause the problem. It turns out it is the
>> RET instruction. I don't understand why because AFAICT it is a
>> pretty much a BR instruction.
>
> Yeah, same thing for Alpha.
>
> It has been my guess that not chaining through RET means that we get
> back to the main loop regularly and often, letting interrupts be
> recognized in a timely manner.
>
> I can't figure out why that would be, however, since interrupts
> *ought* to be setting icount_decr, and the TB to which we chain *is*
> checking that to return to the main loop.

Indeed - if that was broken a lot more stuff wouldn't work.

> Since changing the timing affects the outcome (e.g. -d exec), it
> follows that this *must* be some sort of race condition.  But since
> this still happens with single-threaded mode, I can't imagine what
> sort of race condition it might be.

Apart from timer expiry I can't think what other interactions the other
threads have on the main TCG thread. I guess there is IO but my test
hangs way before the kernel starts poking the disk. Is there an
interaction between IRQs and QEMU's serial driver?

>
> More data points.  I removed the tb_htable_lookup, and that by itself
> is enough to fix Alpha booting.  But it doesn't help the aarch64
> kernel+image that I have.  Which does still boot with -d nochain
> (which, along with disabling goto_tb chaining, also disables all
> goto_ptr).

I wonder what is different about your aarch64 image and mine then?
Because mine works just with suppressing the chaining for RET.

>
> Not really sure where to go from here.

I would agree with Emilio that we revert but I can't quite shake the
feeling we are missing an underlying problem. Would just skipping the
htable lookup (but keeping the tb_jmp_cache) be an OK fix for now? Have
we just been lucky that whatever mechanism causes the "hang" wasn't due
to?

>
>
> r~


--
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]