[Qemu-devel] Re: sparc64 lazy conditional codes evaluation

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: sparc64 lazy conditional codes evaluation

From:	Blue Swirl
Subject:	[Qemu-devel] Re: sparc64 lazy conditional codes evaluation
Date:	Thu, 6 May 2010 21:51:35 +0300

On 5/5/10, Igor Kovalenko <address@hidden> wrote:
> On Wed, May 5, 2010 at 12:21 AM, Blue Swirl <address@hidden> wrote:
>  > On 5/3/10, Igor Kovalenko <address@hidden> wrote:
>  >> On Mon, May 3, 2010 at 11:54 PM, Blue Swirl <address@hidden> wrote:
>  >>  > On 5/3/10, Igor Kovalenko <address@hidden> wrote:
>  >>  >> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <address@hidden> wrote:
>  >>  >>  > On 5/3/10, Igor Kovalenko <address@hidden> wrote:
>  >>  >>  >> Hi!
>  >>  >>  >>
>  >>  >>  >>  There is an issue with lazy conditional codes evaluation where
>  >>  >>  >>  we return from trap handler with mismatching conditionals.
>  >>  >>  >>
>  >>  >>  >>  I seldom reproduce it here when dragging qemu window while
>  >>  >>  >>  machine is working through silo initialization. I use gentoo 
> minimal cd
>  >>  >>  >>  install-sparc64-minimal-20100322.iso but I think anything with 
> silo boot
>  >>  >>  >>  would experience the same. Once in a while it would report crc 
> error,
>  >>  >>  >>  unable to open cd partition or it would fail to decompress image.
>  >>  >>  >
>  >>  >>  > I think I've also seen this.
>  >>  >>  >
>  >>  >>  >>  Pattern that fails appears to require a sequence of compare insn
>  >>  >>  >>  possibly followed by a few instructions which do not touch 
> conditionals,
>  >>  >>  >>  then conditional branch insn. If it happens that we trap while 
> processing
>  >>  >>  >>  conditional branch insn so it is restarted after return from 
> trap then
>  >>  >>  >>  seldom conditional codes are calculated incorrectly.
>  >>  >>  >>
>  >>  >>  >>  I cannot point to exact cause but it appears that after trap 
> return
>  >>  >>  >>  we may have CC_OP and CC_SRC* mismatch somewhere,
>  >>  >>  >>  since adding more cond evaluation flushes over the code helps.
>  >>  >>  >>
>  >>  >>  >>  We already tried doing flush more frequently and it is still not
>  >>  >>  >>  complete, so the question is how to finally do this once and 
> right :)
>  >>  >>  >>
>  >>  >>  >>  Obviously I do not get the design of lazy evaluation right, but
>  >>  >>  >>  the following list appears to be good start. Plan is to prepare
>  >>  >>  >>  a change to qemu and find a way to test it.
>  >>  >>  >>
>  >>  >>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>  >>  >>  >>    use DisasContext->cc_op to predict if flags should be not 
> evaluated
>  >>  >>  >>    due to overriding insn. Instead we can drop cc_op from 
> disassembler
>  >>  >>  >>    context and simplify code to only use cc_op from env.
>  >>  >>  >
>  >>  >>  > Not currently, but in the future we may use that to do even lazier
>  >>  >>  > flags computation. For example the sequence 'cmp x, y; bne target'
>  >>  >>  > could be much more optimal by changing the branch to do the
>  >>  >>  > comparison. Here's an old unfinished patch to do some of this.
>
>
> I wonder if it buys anything. Sparc RISC architecture means optimizing 
> compiler
>  would prevent any extra flags computation, right? So it is basically 1-to-1
>  conditional computation and use. Or even worse, if we delay computation
>  until there are two or more consumers, correct?

For x86 target, that is the other part of savings from using lazy
condition computation. It's true that it will not benefit RISC targets
much.

But I think you are missing the other part, where the actual flags are
not calculated but instead the original comparison can be used.

For example, consider this Sparc64 code:
IN:
0x000001fff000be58:  cmp  %g2, 0x51

OUT: [size=82]
0x40df16b0:  mov    0x10(%r14),%rbp
0x40df16b4:  mov    %rbp,%rbx
0x40df16b7:  sub    $0x51,%rbx
0x40df16bb:  mov    %rbx,%r12
0x40df16be:  mov    %r12,0x10a60(%r14)
0x40df16c5:  mov    %rbp,0x60(%r14)
0x40df16c9:  mov    $0x51,%ebp
0x40df16ce:  mov    %rbp,0x68(%r14)
0x40df16d2:  mov    %rbx,0x70(%r14)
0x40df16d6:  mov    $0x7,%ebp
0x40df16db:  mov    %ebp,0x78(%r14)
0x40df16df:  mov    $0x1fff000be5c,%rbp
0x40df16e9:  mov    %rbp,0x48(%r14)
0x40df16ed:  mov    $0x1fff000be60,%rbp
0x40df16f7:  mov    %rbp,0x50(%r14)
0x40df16fb:  xor    %eax,%eax
0x40df16fd:  jmpq   0xbfface

--------------
IN:
0x000001fff000be5c:  bne  0x1fff000c268

OUT: [size=95]
0x40df1710:  callq  0x518260
0x40df1715:  mov    0x98(%r14),%ebp
0x40df171c:  mov    %ebp,%ebx
0x40df171e:  shr    $0x16,%rbx
0x40df1722:  and    $0x1,%ebx
0x40df1725:  xor    $0x1,%rbx
0x40df1729:  mov    %rbx,0x90(%r14)
0x40df1730:  mov    $0x1fff000be60,%rbp
0x40df173a:  mov    %rbp,0x48(%r14)
0x40df173e:  test   %rbx,%rbx
0x40df1741:  je     0x40df175a
0x40df1747:  mov    $0x1fff000c268,%rbp
0x40df1751:  mov    %rbp,0x50(%r14)
0x40df1755:  jmpq   0x40df1768
0x40df175a:  mov    $0x1fff000be64,%rbp
0x40df1764:  mov    %rbp,0x50(%r14)
0x40df1768:  xor    %eax,%eax
0x40df176a:  jmpq   0xbfface

Instead of calculating any flags, we should generate for combined
'cmp/btest + branch' sequences something like:
mov    0x10(%r14),%rbp
mov    %rbp,%rbx
cmp    $0x51,%rbx
je     0x40ac275a
mov    $0x1fff000c268,%rbp
mov    %rbp,0x50(%r14)
jmpq   0x40ac2768
mov    $0x1fff000be64,%rbp
mov    %rbp,0x50(%r14)
xor    %eax,%eax
jmpq   0xbfface

But I fully agree that correct code generation is the most important issue.

>  >>  >>  >
>  >>  >>  >>    Another point is that we always write to env->cc_op when
>  >>  >>  >>  translating *cc insns
>  >>  >>  >>    This should solve any issue with dc->cc_op prediction going
>  >>  >>  >>    out of sync with env->cc_op and cpu_cc_src*
>  >>  >>  >
>  >>  >>  > I think this is what is happening now.
>  >>  >>  >
>  >>  >>  >>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few 
> cases when
>  >>  >>  >>    a. conditional code is required by insn (like addc, cond 
> branch etc.)
>  >>  >>  >>       - here we can optimize by evaluating specific bits (carry?)
>  >>  >>  >>       - not sure if it works in case we have two cond consuming 
> insns,
>  >>  >>  >>         where first needs carry another needs the rest of flags
>  >>  >>  >
>  >>  >>  > Here's another patch to optimize C flag handling. It doesn't pass 
> my
>  >>  >>  > tests though.
>  >>  >>  >
>  >>  >>  >>    b. CCR is read by rdccr (helper_rdccr)
>  >>  >>  >>       - have to compute all flags
>  >>  >>  >>    c. trap occurs and we prepare trap level context (saving 
> pstate)
>  >>  >>  >>       - have to compute all flags
>  >>  >>  >>    d. control goes out of tcg runtime (so gdbstub reads correct 
> value from env)
>  >>  >>  >>       - have to compute all flags
>  >>  >>  >
>  >>  >>  > Fully agree.
>  >>  >>
>  >>  >>
>  >>  >> Cool
>  >>  >>
>  >>  >>  Still I'd propose to kill dc->cc_op, find a reliable way to test it
>  >>  >>  and then add it back possibly with more optimizations.
>  >>  >>  I'm lost in the code up to the point where I believe we need to
>  >>  >>  save/restore cc_op and cpu_cc* while switching trap levels.
>  >>  >
>  >>  > I'd think this should do the trick:
>  >>  >
>  >>  > diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
>  >>  > index b27778b..94921cd 100644
>  >>  > --- a/target-sparc/op_helper.c
>  >>  > +++ b/target-sparc/op_helper.c
>  >>  > @@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env)
>  >>  >     }
>  >>  >     tsptr = cpu_tsptr(env);
>  >>  >
>  >>  > +    helper_compute_psr();
>  >>  > +
>  >>  >     tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) |
>  >>  >         ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
>  >>  >         GET_CWP64(env);
>  >>  >
>  >>
>  >>
>  >> Thanks, this change seems to work here for silo issue.
>  >>
>  >>  Another change would be to flush for gdbstub use of GET_CCR and for
>  >>  helper_rdccr.
>  >>  I tried to embed flush into GET_CCR but the code looks ugly since we
>  >>  need to proxy a call to helper_compute_psr from gdbstub passing
>  >>  available env pointer.
>  >>
>  >>  Not really tested with your changes, but still what is the breakage you 
> see?
>  >
>  > Aurora 2.0 
> (http://distro.ibiblio.org/pub/linux/distributions/aurora/build-2.0/sparc/iso/)
>  > breaks.
>  >
>  > This is what I get with git HEAD, having pressed enter key twice:
>  > Welcome to Aurora SPARC Linux
>  >
>  >
>  >
>  >
>  >                   +--------------+ CD Found +--------------+
>  >                   |                                        |
>  >                   | To begin testing the CD media before   |
>  >                   | installation press OK.                 |
>  >                   |                                        |
>  >                   | Choose Skip to skip the media test     |
>  >                   | and start the installation.            |
>  >                   |                                        |
>  >                   |      +----+             +------+       |
>  >                   |      | OK |             | Skip |       |
>  >                   |      +----+             +------+       |
>  >                   |                                        |
>  >                   |                                        |
>  >                   +----------------------------------------+
>  >
>  >
>  >
>  >
>  >  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>  >
>  > This is what I get with the C flag patch applied:
>  > Welcome to Aurora SPARC Linux
>  >
>  >
>  >
>  >
>  >
>  >                    +--------------+ Error +---------------+
>  >                    |                                      |
>  >                    | failed to read keymap information:   |
>  >                    | Success                              |
>  >                    |                                      |
>  >                    |               +----+                 |
>  >                    |               | OK |                 |
>  >                    |               +----+                 |
>  >                    |                                      |
>  >                    |                                      |
>  >                    +--------------------------------------+
>  >
>  >
>  >
>  >
>  >
>  >
>  >  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>  >
>
>
> I do reproduce the issue here with 0001-Convert-C-flag-input-BROKEN.patch

The patch seems harmless, so maybe it uncovers some hideous bug.

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/03
- [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/03
  - [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/03
    - [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/03
    - [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/03
    - [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/04
    - [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/05
    - [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl <=
    - [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Igor Kovalenko, 2010/05/08
    - [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/09
    - Re: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Mark Cave-Ayland, 2010/05/10
    - Re: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Blue Swirl, 2010/05/10
    - Re: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation, Mark Cave-Ayland, 2010/05/15

Prev by Date: [Qemu-devel] [PATCH 4/5] tcg-ia64: Load GUEST_BASE into a register.
Next by Date: [Qemu-devel] [PATCH] block/vdi: Allow disk images of size 0
Previous by thread: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
Next by thread: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
Index(es):
- Date
- Thread