qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination


From: Blue Swirl
Subject: Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination
Date: Sun, 15 May 2011 15:30:17 +0300

On Sun, May 15, 2011 at 2:36 PM, Aurelien Jarno <address@hidden> wrote:
> On Sun, May 15, 2011 at 02:12:27PM +0300, Blue Swirl wrote:
>> On Sun, May 15, 2011 at 1:19 PM, Aurelien Jarno <address@hidden> wrote:
>> > I still don't get where are this list of possible changes? Did I miss
>> > something in another thread?
>>
>> I'm referring to the patches I sent.
>
> Ok, patches 1, 2 and 4 to 7 looks ok at a first glance, though I think
> patches 6 and 7 should be done for all hosts or none of them.

The changes can be done in steps, but of course removing temp_buf from
CPUState would need all targets to be converted first.

>> > On the TCG generated code, the env structure is used almost for every
>> > op, so it really makes sense to keep it in a register instead of having to
>> > reload the address of env regularly from memory. Given it only affects
>> > TCG generated code, I don't see the point of portability here.
>>
>> For example, maybe the bugs in Sparc glibc could be avoided by using
>> one of %i set of registers (not accessible from helpers) for AREG0
>> within generated code instead of %g registers which seem to be
>> fragile.
>
> First of all, but it's a different subject, I am not sure there are
> sparc glibc bugs, I'd rather says QEMU mis-uses some register. For
> example the following code is probably wrong:
>
> /* Note: must be synced with dyngen-exec.h */
> #ifdef CONFIG_SOLARIS
> #define TCG_AREG0 TCG_REG_G2
> #elif defined(__sparc_v9__)
> #define TCG_AREG0 TCG_REG_G5
> #else
> #define TCG_AREG0 TCG_REG_G6
> #endif
>
> __sparc_v9__ can set on the 32-bit ABI, when the compiler targets V8+,
> so the condition is probably wrong there. Secondly the SPARC ABI [1] on
> page 23 that %g5 to %g7 are reserved for system. I don't think QEMU has
> the right to use this registers.

Yes, but the situation is not so nice. Please see this post for status
as of 2010:
http://article.gmane.org/gmane.comp.emulators.qemu/63610

This is from Debian glibc 2.11.2-10:
$ file /lib/libc-2.11.2.so
/lib/libc-2.11.2.so: ELF 32-bit MSB shared object, SPARC32PLUS,
version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux
2.6.18, stripped
$ objdump -d /lib/libc.so.6 |grep %g1|wc -l
69648
$ objdump -d /lib/libc.so.6 |grep %g2|wc -l
37299
$ objdump -d /lib/libc.so.6 |grep %g3|wc -l
20635
$ objdump -d /lib/libc.so.6 |grep %g4|wc -l
11603
$ objdump -d /lib/libc.so.6 |grep %g5|wc -l
448
$ objdump -d /lib/libc.so.6 |grep %g6|wc -l
150
$ objdump -d /lib/libc.so.6 |grep %g7|wc -l
3052

Glibc is compiled for Sparc32plus, so it should only use %g6 and %g7,
or %g1 and %g5 for scratch purposes. However, it is the application
registers %g2 to %g4 that are used heaviest. Looking inside the
objdump it's easy to see that the uses are not for example saving or
restoring, but actually using them without saving the previous value
first:
000211e0 <__divdi3>:
   211e0:       9d e3 bf a0     save  %sp, -96, %sp
   211e4:       90 10 00 18     mov  %i0, %o0
   211e8:       92 10 00 19     mov  %i1, %o1
   211ec:       94 10 00 1a     mov  %i2, %o2
   211f0:       96 10 00 1b     mov  %i3, %o3
   211f4:       80 a6 20 00     cmp  %i0, 0
   211f8:       06 40 00 10     bl,pn   %icc, 21238 <__divdi3+0x58>
   211fc:       a0 10 20 00     clr  %l0
   21200:       80 a2 a0 00     cmp  %o2, 0
   21204:       26 40 00 13     bl,a,pn   %icc, 21250 <__divdi3+0x70>
   21208:       a0 38 00 10     xnor  %g0, %l0, %l0
   2120c:       7f ff fe ed     call  20dc0 <__ashldi3+0x40>
   21210:       98 10 20 00     clr  %o4
   21214:       84 10 00 08     mov  %o0, %g2

...whoops...

   21218:       80 a4 20 00     cmp  %l0, 0
   2121c:       02 40 00 04     be,pn   %icc, 2122c <__divdi3+0x4c>
   21220:       86 10 00 09     mov  %o1, %g3

...whoops...

   21224:       86 a0 00 09     subcc  %g0, %o1, %g3
   21228:       84 60 00 02     subc  %g0, %g2, %g2
   2122c:       b2 10 00 03     mov  %g3, %i1
   21230:       81 cf e0 08     rett  %i7 + 8
   21234:       90 10 00 02     mov  %g2, %o0
   21238:       92 a0 00 19     subcc  %g0, %i1, %o1
   2123c:       90 60 00 18     subc  %g0, %i0, %o0
   21240:       80 a2 a0 00     cmp  %o2, 0
   21244:       16 4f ff f2     bge  %icc, 2120c <__divdi3+0x2c>
   21248:       a0 10 3f ff     mov  -1, %l0
   2124c:       a0 38 00 10     xnor  %g0, %l0, %l0
   21250:       96 a0 00 0b     subcc  %g0, %o3, %o3
   21254:       10 6f ff ee     b  %xcc, 2120c <__divdi3+0x2c>
   21258:       94 60 00 0a     subc  %g0, %o2, %o2
   2125c:       01 00 00 00     nop

This is libc from OpenBSD/Sparc64 4.9:
$ objdump -d /usr/lib/libc.so.58.0 |grep %g1|wc -l
   40562
$ objdump -d /usr/lib/libc.so.58.0 |grep %g2|wc -l
   20384
$ objdump -d /usr/lib/libc.so.58.0 |grep %g3|wc -l
   10240
$ objdump -d /usr/lib/libc.so.58.0 |grep %g4|wc -l
    6606
$ objdump -d /usr/lib/libc.so.58.0 |grep %g5|wc -l
    3811
$ objdump -d /usr/lib/libc.so.58.0 |grep %g6|wc -l
       4
$ objdump -d /usr/lib/libc.so.58.0 |grep %g7|wc -l
      20

Not so great there either.

> Anyway, I don't see why keeping TCG_AREG0 inside the TCG generated code
> would prevent you to use a register from the %i set for it.

The helpers currently use global env register, but %i registers can't
be accessed from the next level of function call nesting hierarchy so
they can't be used for global env.

>> >> > On the other hand, I have objections to remove uses of TCG_AREG0 from
>> >> > the TCG part. This register is part of the TCG design and thus used very
>> >> > often. In any case we have to keep it for accesses to the globals, so we
>> >> > can keep it for softmmu load/stores too.
>> >>
>> >> Right, any changes would need to be backed by performance figures.
>> >>
>> >> > If we go for the removal of
>> >> > TCG_AREG0 from the GCC side, it probably means loading it in the 
>> >> > prologue
>> >> > instead.
>> >>
>> >> Do you mean cpu-exec.c side with this? I think this is a very similar
>> >> register tradeoff case to helpers.
>> >
>> > I mean calling tcg_qemu_tb_exec(tb_ptr, env) instead of
>> > tcg_qemu_tb_exec(tb_ptr), and modification in the prologue to save the
>> > env pointer into the TCG_AREG0 register (which is kept internal to TCG
>> > generated code).
>>
>> Maybe it would be easy to test and benchmark this change, it doesn't
>> affect generated code or helpers.
>>
>
> Yes it's something that can be benchmarked. From experience it won't
> make any significant performance change, it's basically two register
> moves (one in the caller, one in the prologue), for each TB which is
> not chained. In any case, it will introduce a slight overhead that has
> to be compensated by allowing the helpers to use an additional
> register.

But like in the helper case, cpu-exec.c can then be compiled without
HELPER_CFLAGS or global env register, which would make that register
available for GCC. This may or may not compensate for the extra moves.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]