qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 0/3] tcg: enhance code generation quality for


From: Yeongkyoon Lee
Subject: Re: [Qemu-devel] [PATCH v7 0/3] tcg: enhance code generation quality for qemu_ld/st IRs
Date: Tue, 30 Oct 2012 13:45:48 +0900
User-agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1

I've found that git status of my local repo is somewhat twisted.
Sorry for your inconvenience. I'll send new patch after cleaning my repo.

On 2012년 10월 29일 23:32, Yeongkyoon Lee wrote:
Here is the 7th version of the series optimizing TCG qemu_ld/st code generation.

v7:
   - Rebase and fix mistyping

v6:
   - Remove an extra argument of return addr from MMU helpers
     Instead, embed the fast path addr to the slow path for helpers to use it
   - Change some bitwise operations to bitfields of structure
   - Change the name of function which handles finalization of TB code 
generation

v5:
   - Remove RFC tag

v4:
   - Remove CONFIG_SOFTMMU pre-condition from configure
   - Instead, add some CONFIG_SOFTMMU condition to TCG sources
   - Remove some unnecessary comments

v3:
   - Support CONFIG_TCG_PASS_AREG0
     (expected to get more performance enhancement than others)
   - Remove the configure option "--enable-ldst-optimization""
   - Make the optimization as default on i386 and x86_64 hosts
   - Fix some mistyping and apply checkpatch.pl before committing
   - Test i386, arm and sparc softmmu targets on i386 and x86_64 hosts
   - Test linux-user-test-0.3

v2:
   - Follow the submit rule of qemu

v1:
   - Initial commit request

I think the generated codes from qemu_ld/st IRs are relatively heavy, which are
up to 12 instructions for TLB hit case on i386 host.
This patch series enhance the code quality of TCG qemu_ld/st IRs by reducing
jump and enhancing locality.
Main idea is simple and has been already described in the comments in
tcg-target.c, which separates slow path (TLB miss case), and generates it at the
end of TB.

For example, the generated code from qemu_ld changes as follow.
Before:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (5)
(3) TLB hit case: Load value from host memory
(4) Jump to next code (6)
(5) TLB miss case: call MMU helper
(6) ... (next code)

After:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (5)
(3) TLB hit case: Load value from host memory
(4) ... (next code)
...
(5) TLB miss case: call MMU helper
(6) Jump to (8)
(7) [embedded addr of (4)] <- never executed but read by MMU helpers
(8) Return to next code (4)

Following is some performance results measured based on qemu 1.0.
Although there was measurement error, the results was not negligible.

* EEMBC CoreMark (before -> after)
   - Guest: i386, Linux (Tizen platform)
   - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
   - Results: 1135.6 -> 1179.9 (+3.9%)

* nbench (before -> after)
   - Guest: i386, Linux (linux-0.2.img included in QEMU source)
   - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
   - Results
     . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%)
     . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%)
     . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%)

Summarized features:
  - The changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and
    they are enabled by default on i386/x86_64 hosts
  - Forced removal of the macro will cause compilation error on i386/x86_64 
hosts
  - No implementations other than i386/x86_64 hosts yet

In addition, I have tried to remove the generated codes of calling MMU helpers
for TLB miss case from end of TB, however, have not found good solution yet.
In my opinion, TLB hit case performance could be degraded if removing the
calling codes, because it needs to set runtime parameters, such as, data,
mmu index and return address, in register or stack though they are not used
in TLB hit case.
This remains as a further issue.

Yeongkyoon Lee (3):
   configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st
     optimization
   tcg: Add extended GETPC mechanism for MMU helpers with ldst
     optimization
   tcg: Optimize qemu_ld/st by generating slow paths at the end of a
     block

  configure             |    6 +
  exec-all.h            |   36 +++++
  exec.c                |   11 ++
  softmmu_template.h    |   16 +-
  tcg/i386/tcg-target.c |  415 +++++++++++++++++++++++++++++++++----------------
  tcg/tcg.c             |   12 ++
  tcg/tcg.h             |   30 ++++
  7 files changed, 385 insertions(+), 141 deletions(-)






reply via email to

[Prev in Thread] Current Thread [Next in Thread]