[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: TCG performance on PPC64
From: |
Matheus K. Ferst |
Subject: |
Re: TCG performance on PPC64 |
Date: |
Thu, 19 May 2022 17:31:54 -0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 |
On 18/05/2022 11:44, Richard Henderson wrote:
On 5/18/22 06:16, Matheus K. Ferst wrote:
As a final test, I changed the images to have a normal user account
already created and
unlocked, disabled Cloud-Init, downloaded bc-1.07 sources[4][5],
installed its build
dependencies[6], and changed the test script to login, extract,
configure, build, and
shutdown the guest. I also added an aarch64 compatible machine (Apple
M1 w/ 10 cores) to
our test setup. Running 100 iterations gave us the following results:
+---------+----------------------------------------------------+
| | Host |
| Guest +-----------------+-----------------+----------------+
| | PPC64 | x86_64 | aarch64 |
+---------+-----------------+-----------------+----------------+
| PPC64 | 429.82 ± 11.57 | 352.34 ± 8.51 | 180.78 ± 42.02 |
| aarch64 | 1029.78 ± 46.01 | 1207.98 ± 80.49 | 487.50 ± 7.54 |
| s390x | 589.97 ± 86.67 | 411.83 ± 41.88 | 221.86 ± 79.85 |
+---------+-----------------+-----------------+----------------+
These are some weird results. Particularly the aarch64 host ones -- I'm
really surprised
that it's that much faster than the x86_64 at anything. Oh, the
E5-2687W was discontinued
7 years ago. So I'll just put that down to age.
Right, this Xeon was discontinued even before POWER9 was launched. It's
slower in other tasks but still outperforms PPC64 in TCG emulation.
What would be different in aarch64 emulation that yields a better
performance on our POWER9?
That is a very good question.
- I suppose that aarch64 has more instructions with GVec
implementations than PPC64 and
s390x, so maybe aarch64 guests can better use host-vector instructions?
No, there's very little gvec in a kernel boot cycle. Not none, but very
little.
- Looking at the flame graphs of each test (attached), I can see
that tb_gen_code takes
proportionally less time of aarch64 emulation than PPC64 and s390x, so
it might be that
decodetree is faster?
No. (1) aarch64 base instructions aren't using decodetree, (2) the
existing ppc and s390
decode is pretty well architected; decodetree is not particularly
optimized, it's simply
meant to be more readable.
Looking at the aarch64-on-ppc64 graph, I see that PAC encryption is
taking up a huge
proportion of your runtime. Probably gcc has done a better job with
those routines for
ppc64 host. You may want to run the aarch64 guest tests again with -cpu
max,pauth=off.
You are right, with pauth=off:
+---------+------------------------------------------------+
| | Host |
| Guest +----------------+---------------+---------------+
| | PPC64 | x86_64 | aarch64 |
+---------+----------------+---------------+---------------+
| aarch64 | 395.02 ± 12.22 | 339.13 ± 6.34 | 148.88 ± 8.32 |
+---------+----------------+---------------+---------------+
I wonder if the s390x command line also needs some cpu/machine options
to be more representative of "normal" TCG uses.
Otherwise, the flame graph columns are too narrow to actually read, for me.
If your SVG viewer knows JS/CSS/etc., you can click a block to "zoom in"
a particular call stack, function name and number of samples are shown
on mouse hover, and there is a search tool with ctrl+f.
The results are also on a GitHub Wiki page now:
https://github.com/PPC64/qemu/wiki/TCG-Performance-on-PPC64
Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
- TCG performance on PPC64, Matheus K. Ferst, 2022/05/18
- Re: TCG performance on PPC64, Daniel Henrique Barboza, 2022/05/18
- Re: TCG performance on PPC64, Cédric Le Goater, 2022/05/18
- Re: TCG performance on PPC64, Mark Cave-Ayland, 2022/05/18
- Re: TCG performance on PPC64, Richard Henderson, 2022/05/18
- Re: TCG performance on PPC64,
Matheus K. Ferst <=
- Re: TCG performance on PPC64, David Gibson, 2022/05/19