qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/4] tcg-hppa: Finish the port.


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH 2/4] tcg-hppa: Finish the port.
Date: Fri, 9 Apr 2010 01:01:49 +0200
User-agent: Mutt/1.5.20 (2009-06-14)

On Thu, Apr 08, 2010 at 09:32:41AM -0700, Richard Henderson wrote:
> On 04/08/2010 02:56 AM, Aurelien Jarno wrote:
> > I have applied the patch. I have some comments though, it would be nice
> > if you can address them with additional patches.
> 
> Sure.
> 
> >> +static void tcg_out_ori(TCGContext *s, int ret, int arg, tcg_target_ulong 
> >> m)
> >> +{
> >> +    if (m == 0) {
> >> +        tcg_out_mov(s, ret, arg);
> >> +    } else if (m == -1) {
> >> +        tcg_out_movi(s, TCG_TYPE_I32, ret, -1);
> > 
> > Those cases are already eliminated in tcg/tcg-op.h. This code looks
> > redundant.
> 
> The cases eliminated in tcg-op.h are with immediate constants.
> There is no generic code in tcg.c to eliminate these cases 
> after constant propagation.  However, I can remove them with...

Ok, fine.

> >> +    } else {
> >> +        tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m);
> >> +        tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_OR);
> > 
> > Do we really want a movi here? It would be better to leave the tcg code
> > load the constant itself, so that if the same constant is used twice, it
> > is only loaded once.
> 
> I've never caught TCG properly re-using constants, but I take your
> point -- there's no reason why tcg.c can't be improved, and this 
> port would miss out on that improvement.  I'll invent a constraint
> that matches or_mask_p.
> 
> >> +static void tcg_out_andi(TCGContext *s, int ret, int arg, 
> >> tcg_target_ulong m)
> ...
> >> +        tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m);
> >> +        tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_AND);
> > 
> > Same.
> 
> ANDI slightly different case.  This function is used by tcg_out_tlb_read
> with constants that may or may not satisfy and_mask_p.  I think it's cleaner
> to handle the arbitrary case here, rather than open code the same test in
> the tlb read function.
> 
> I will of course add a constraint to match and_mask_p, for ANDs that 
> originate within the opcode stream.

Ok, fine.

> >> +        tcg_out_reloc(s, s->code_ptr, R_PARISC_PCREL17F, label_index, 0);
> >> +        tcg_out32(s, op);
> > 
> > This breaks partial retranslation. The bits corresponding to the offset
> > should be preserved.
> 
> I don't recall ever hearing about re-translation.  Can you point me
> at the bits that do it, so I can figure out what's going on?  This
> sounds like something that ought to be documented properly...
> 
> I rather assumed that the "addend" parameter to patch_reloc would
> hold whatever is really needed to be preserved.  What else is that
> field for, anyway? 

The problem is that in case of an exception, the code is retranslated to
get the address (on the guest side) of the exception. The retranslation
is done using the same buffer, and stops as soon as the address is
found.

It means that the branch instruction is rewritten with a new address,
why the relocation is not retranslated. In short the jump address is
then pointing to the wrong address, which causes either an endless loop
or a crash.

This is something visible in system mode, usually it starts to appear
when the guest switches to userland.

To prevent that, the code should change only the bits defining the
jump instruction, leaving the others defining the address unchanged.

> >> -    if (opc == 3)
> >> -        data_reg2 = *args++;
> >> -    else
> >> -        data_reg2 = 0; /* suppress warning */
> >> +    data_reg2 = (opc == 3 ? *args++ : TCG_REG_R0);
> > 
> > I am not sure TCG_REG_R0 is really correct here, and I find it confusing.
> > While it's value is zero, the assignment there is just to make GCC
> > happy, it won't be used after
> 
> Correct.  I don't see what else I can really do though.  I think it's
> worse to mix types: integer-as-register-number (i.e. *args) and 
> integer-as-filler (i.e. 0).  Better to at least have them be the same
> type as it clarifies that *args must be a register number.
> 
> Perhaps just a comment here?

I think the old code was actually pretty fine, that is the '0' value,
plus a comment. I don't really see why it was necessary to change this
code. 

> >>  #if defined(CONFIG_SOFTMMU)
> >> -    tcg_out_mov(s, r1, addr_reg);
> >> +    lab1 = gen_new_label();
> >> +    lab2 = gen_new_label();
> > 
> > Do you really want to use label here? load/store are the most common
> > instructions, I am not really sure of the resulting performance.
> 
> I think the code is *so* much more readable re-using the usual branch
> and relocate code.  I'd almost rather spend the time speeding up the
> use of temporary labels than uglifying the code here.
> 
> >> +    /* These three correspond exactly to the fallback implementation.
> >> +       But by including them we reduce the number of TCG ops that 
> >> +       need to be generated, and these opcodes are fairly common.  */
> > 
> > Are you sure it really makes a difference?
> 
> Not quantifiably, but the reasoning is sound.  I can remove them if you 
> insist.

My point is on one side you seems to look for performance, while on the
others (labels just above), you don't really care about performance.

> >> +    { INDEX_op_add_i32, { "r", "rZ", "ri" } },
> >> +    { INDEX_op_sub_i32, { "r", "rI", "ri" } },
> >> +    { INDEX_op_and_i32, { "r", "rZ", "ri" } },
> >> +    { INDEX_op_or_i32, { "r", "rZ", "ri" } },
> > 
> > Already commented for "and" and "or", but the same apply for add and 
> > sub. Do we really need a "i" contraints here if the constant is going 
> > to be loaded with a movi.
> 
> ADD and SUB are not going to use movi.  They will use one or both of
> ADDIL (21-bit constant << 11) and LDO (14-bit constant).  As a pair
> these insns can perform a full 32-bit constant addition.
> 
> I suppose technically there's a subset of 32-bit constants that could
> benefit from generic code loading constants into registers.  The only
> valid output register for ADDIL is R1.  So at the moment for
> 
>       R3 = R4 + 0x10000;
> 
> we generate
> 
>       addil   0x10000, r4, r1
>       copy    r1, r3
> 
> where we could equivalently generate
> 
>       ldil    0x10000, r5
>       add     r4, r5, r3
> 
> However I don't think this is worth worrying about in the short term.

Ok.

> >> +    if (GUEST_BASE != 0) {
> >> +        tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, GUEST_BASE);
> >> +    }
> > 
> > The final GUEST_BASE value is computed after the prologue has been
> > generated. The value is modified in two cases:
> > - The user specify a non-aligned base address.
> > - /proc/sys/vm/mmap_min_addr is different than 0, which is now the
> >   in default configuration for more than one year.
> > 
> > When it happens, the guest crashes almost immediately.
> 
> To be fair, mmap_min_addr only affects GUEST_BASE if the executable
> image we've loaded overlaps.  Which is uncommon, but certainly possible.

When I found the problem on the tcg ia64, it was crashing for all
binaries I tried.

> Hmm.  I wonder which is better: one extra instruction needed per qemu_ld
> vs having one more call-saved register available.  At the moment we don't
> even come close to using all of the call-saved registers, and it would be
> easy enough to have the prologue read the actual guest_base variable rather
> than embed the constant.
> 

The other option is to reorganize the order in which the prologue is
generated and the guest base value computed. The work is probably more
important though.


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net




reply via email to

[Prev in Thread] Current Thread [Next in Thread]