qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to spe


From: Artyom Tarasenko
Subject: Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
Date: Sat, 22 Aug 2015 18:45:09 +0200

On Thu, Aug 20, 2015 at 7:19 PM, Richard Henderson <address@hidden> wrote:
> On 08/19/2015 07:41 AM, Artyom Tarasenko wrote:
>>
>> Without the patch:
>>
>>   time g++ -DHAVE_CONFIG_H -I. -I../binutils-gdb/gold
>> -I../binutils-gdb/gold -I../binutils-gdb/gold/../include
>> -I../binutils-gdb/gold/../elfcpp
>> -DLOCALEDIR="\"/usr/local/share/locale\""
>> -DBINDIR="\"/usr/local/bin\"" -DTOOLBINDIR="\"/usr/local//bin\""
>> -DTOOLLIBDIR="\"/usr/local//lib\""   -W -Wall    -Werror
>> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -frandom-seed=tilegx.o
>> -I../binutils-gdb/gold/../zlib -g -O2 -MT tilegx.o -MD -MP -MF
>> .deps/tilegx.Tpo -c -o tilegx.o ../binutils-gdb/gold/tilegx.cc
>>
>> real    18m31.407s
>> user    18m23.661s
>> sys     0m6.784s
>>
>> The patch surely improves the situation, tcg_optimize in the perf top
>> takes ~7% (instead of~12%), and the only function marked red by
>> perf-top is init_temp_info(). So with the patch:
>>
>> real    17m46.380s
>> user    17m37.522s
>> sys     0m7.120s
>>
>>
>> And if I completely disable optimizer (// #define
>> USE_TCG_OPTIMIZATIONS in tcg.c), it's still quite faster:
>>
>> real    14m17.668s
>> user    14m10.241s
>> sys     0m6.060s
>
>
> This isn't surprising, because at the moment tcg optimizations are almost
> completely ineffective for sparc.  The way the register windows are
> implemented means that there are very few proper tcg temporaries to
> optimize.
>
> I've just updated an old branch that attempts to cure this.  It creates
> proper tcg temporaries for the windowed registers, and uses a bit of
> recursion to find the place at which they should be stored.
>
>   git://github.com/rth7680/qemu.git tcg-indirect
>
> With a few quick unscientific tests, it appears to help.  It would be nice
> to put that branch side-by-side with your tests above.

Sorry for the delay with testing.

For my test case tcg-indirect brings more performance gain than for Dennis:

git master: 18m31s
tcg-indirect: 16m50s
#undef  USE_TCG_OPTIMIZATIONS: 14m18s


JIT statistic, before starting the test:
(qemu) info jit
Translation buffer state:
gen code size       31851136/314448896
TB count            128224/2457592
TB avg target size  18 max=704 bytes
TB avg host size    248 bytes (expansion ratio: 13.4)
cross page TB count 0 (0%)
direct jump count   83840 (65%) (2 jumps=64730 50%)

Statistics:
TB flush count      5
TB invalidate count 317160
TLB flush count     1180769
[TCG profiler not compiled]

After
(qemu) info jit
Translation buffer state:
gen code size       282903344/314448896
TB count            1139744/2457592
TB avg target size  17 max=704 bytes
TB avg host size    248 bytes (expansion ratio: 14.0)
cross page TB count 0 (0%)
direct jump count   739828 (64%) (2 jumps=569074 49%)

Statistics:
TB flush count      5
TB invalidate count 324362
TLB flush count     2050744

So, TB invalidate count gained only ~ 5000.
Yet tcg_optimize is ~7% in the perf top, and tcg_liveness_analysis
~3%. Why do we translate so much?


Artyom

-- 
Regards,
Artyom Tarasenko
16m50.161s
SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]