qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-


From: Yeongkyoon Lee
Subject: Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel)
Date: Thu, 21 Mar 2013 16:04:44 +0900
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2

On 03/18/2013 07:27 AM, Aurélien Jarno wrote:
On Wed, Mar 06, 2013 at 07:10:17AM +0100, Aurélien Jarno wrote:
On Wed, Mar 06, 2013 at 11:05:15AM +0900, Yeongkyoon Lee wrote:
On 03/05/2013 11:18 PM, Aurélien Jarno wrote:
On Mon, Mar 04, 2013 at 05:37:31PM +0100, Aurélien Jarno wrote:
Hi,

On Sat, Feb 23, 2013 at 11:10:18PM +0100, Stefan Weil wrote:
This assertion occured with latest git master:

qemu-system-mipsel: /src/qemu/tcg/tcg-op.h:2589:
  tcg_gen_goto_tb: Assertion `(tcg_ctx.goto_tb_issue_mask & (1 << idx))
== 0' failed.
Aborted

QEMU was built with --enable-debug and running a Debian MIPS Lenny (NFS
root).
The assertion happened when running "apt-get update" in the guest.

Is it something reproductible or more or less random? Have you Cc:ed
Richard because it's related to the latest patches?

On my side I am experiencing random segfaults in various guests (at
least PowerPC, MIPS, SH4 and ARM). I have found a way to bisect it, even
if it is quite long (building Perl + the testsuite). Currently I know
that 1.3 is affected, while 1.2 is not.

I have found that the issue comes from the following commits, which
unfortunately are not bisectable one by one (though it won't change the
results a lot):

     commit b76f0d8c2e3eac94bc7fd90a510cb7426b2a2699
     Author: Yeongkyoon Lee <address@hidden>
     Date:   Wed Oct 31 16:04:25 2012 +0900
         tcg: Optimize qemu_ld/st by generating slow paths at the end of a block
         Add optimized TCG qemu_ld/st generation which locates the code of TLB 
miss
         cases at the end of a block after generating the other IRs.
         Currently, this optimization supports only i386 and x86_64 hosts.
         Signed-off-by: Yeongkyoon Lee <address@hidden>
         Signed-off-by: Blue Swirl <address@hidden>
     commit fdbb84d1332ae0827d60f1a2ca03c7d5678c6edd
     Author: Yeongkyoon Lee <address@hidden>
     Date:   Wed Oct 31 16:04:24 2012 +0900
         tcg: Add extended GETPC mechanism for MMU helpers with ldst 
optimization
         Add GETPC_EXT which is used by MMU helpers to selectively calculate 
the code
         address of accessing guest memory when called from a qemu_ld/st 
optimized code
         or a C function. Currently, it supports only i386 and x86-64 hosts.
         Signed-off-by: Yeongkyoon Lee <address@hidden>
         Signed-off-by: Blue Swirl <address@hidden>
     commit 32761257c0b9fa7ee04d2871a6e48a41f119c469
     Author: Yeongkyoon Lee <address@hidden>
     Date:   Wed Oct 31 16:04:23 2012 +0900
         configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st 
optimization
         Enable CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization 
only when
         a host is i386 or x86_64.
         Signed-off-by: Yeongkyoon Lee <address@hidden>
         Signed-off-by: Blue Swirl <address@hidden>

I will try to understand why.


Hi Aurélien,
Do you mean that those random segfaults occurred only when
configured with "--enable-debug"?
Although I cannot see how my commits affect debug built image at a
glance, I'll do double-check.
Thanks.
The problem is there even without configuring QEMU with --enable-debug.
It justs doesn't happens very often, and very randomly. The only way to
reproduce it each time is to launch a big task in the guest (for me
building Perl) and see if it completes or now. It can take up to one
hour until it happens.

I should precise that the segfault is on the guest side.

I have tried to look at your patches, and so far I haven't found the
issue. It seems the two first patches are fine, ie I have verified the
return address is always correctly computed.

I still haven't found the issue, but on the other hand I can't find any
problem in your code, after reading it dozen of times. I also tried to
modify it as less as possible while issuing the slow path back inside
the TB and it fixes the problem. So it really looks like to be due to
the slow path being at the end of the TB, and not to a bug in the code
generating it. After adding various checks, I am also convinced the
address computed in GETPC_EXT() is always correct. I have to say I am
running out of ideas.

One way to reproduce the issue more easily is to reduce the size of the
generated code buffer, for example by setting it to 512kB for both
MIN_CODE_GEN_BUFFER_SIZE and MAX_CODE_GEN_BUFFER_SIZE in
translate-all.c. That way booting an ARM guest triggers plenty of
segmentation faults or other strange issues with your patch but not
without.

OTOH increasing this size make the issue to almost disappear even when
building perl including the testsuite (for that it has to be at least
512MB).


Although I've not succeeded to reproduce the problem, I've found a suspicious code stub about boundary-checking of generated code (is_tcg_gen_code() in translate-all.c).

The code is supposed to be changed as follows.case
Before:
    return (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
                tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer +
                tcg_ctx.code_gen_buffer_max_size));
After:
    return (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
                tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer +
                tcg_ctx.code_gen_buffer_size));

The reason is that there could happen to miss out the generated code ranges by "(TCG_MAX_OP_SIZE * OPC_BUF_SIZE)".
See code_gen_alloc() in translate-all.c:
tcg_ctx.code_gen_buffer_max_size = tcg_ctx.code_gen_buffer_size - (TCG_MAX_OP_SIZE * OPC_BUF_SIZE)

Aurélien and Stefan,
Could you please test this and feedback the result?
Because, I'm not able to reproduce this problem, though I follow up Aurélien's reproducible steps.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]