[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 7/7] tcg-i386: Perform tail call to qemu_ret_ld*
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH 7/7] tcg-i386: Perform tail call to qemu_ret_ld*_mmu |
Date: |
Thu, 29 Aug 2013 18:36:47 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 |
Il 29/08/2013 18:08, Richard Henderson ha scritto:
> Where I do think there's cause for treading carefully is wrt Aurelien's
> statement "it's the slow path exception, the call-return stack doesn't
> matter".
> Alternately, given that it *is* the slow path, who cares if the return from
> the helper immediately hits a branch, rather than tail-calling back into the
> fast path, if the benefit is that the call-return stack is still valid above
> the code_gen_buffer after a simple tlb miss?
Aurelien's comment was that lea+push+jmp is smaller than lea+call+ret,
which I can buy.
I guess it depends more than everything on the hardware implementation
of return branch prediction, and _how much_ the call-return stack is broken.
PPC's mtlr+b+...+blr and x86's push+jmp+...+ret are quite similar in
this respect, and they beg the same question. After the blr/ret, is the
entire predictor state broken or will the processor simply take a miss
and still keep the remainder of the stack valid? (For x86 it could in
principle see that the stack pointer is lower and thus keep the entries
above it. For PPC it's not that simple since LR is a callee-save
register, but there's probably plenty of tricks and heuristics that can
be employed).
>
> As an aside, why why o why do we default to -fstack-protector-all? Do we
> really need checks in every single function, as opposed to those that actually
> do something with arrays? Switch to plain -fstack-protector so we have
>
>> 00000000005a1fd0 <helper_ret_ldsw_mmu>:
>> 5a1fd0: 48 83 ec 08 sub $0x8,%rsp
>> 5a1fd4: e8 57 fe ff ff callq 5a1e30 <helper_ret_lduw_mmu>
>> 5a1fd9: 48 83 c4 08 add $0x8,%rsp
>> 5a1fdd: 48 0f bf c0 movswq %ax,%rax
>> 5a1fe1: c3 retq
>> 5a1fe2: 66 66 66 66 66 2e 0f data32 data32 data32 data32 nopw
>> %cs:0x0(%rax,%rax,1)
>> 5a1fe9: 1f 84 00 00 00 00 00
>
> and then lets talk about icache savings...
I think it was simply paranoia + not knowing the difference. Patch
welcome I guess. (And I admit I only skimmed the patches so I didn't
know how small the wrappers were).
Paolo
- [Qemu-devel] [PATCH 0/7] Further tcg ldst improvements, Richard Henderson, 2013/08/27
- [Qemu-devel] [PATCH 1/7] exec: Reorganize the GETRA/GETPC macros, Richard Henderson, 2013/08/27
- [Qemu-devel] [PATCH 4/7] target: Include softmmu_exec.h where forgotten, Richard Henderson, 2013/08/27
- [Qemu-devel] [PATCH 3/7] exec: Rename USUFFIX to LSUFFIX, Richard Henderson, 2013/08/27
- [Qemu-devel] [PATCH 5/7] exec: Split softmmu_defs.h, Richard Henderson, 2013/08/27
- [Qemu-devel] [PATCH 2/7] tcg-i386: Don't perform GETPC adjustment in TCG code, Richard Henderson, 2013/08/27
- [Qemu-devel] [PATCH 7/7] tcg-i386: Perform tail call to qemu_ret_ld*_mmu, Richard Henderson, 2013/08/27
- Re: [Qemu-devel] [PATCH 7/7] tcg-i386: Perform tail call to qemu_ret_ld*_mmu, Aurelien Jarno, 2013/08/29
- [Qemu-devel] [PATCH 6/7] tcg: Introduce zero and sign-extended versions of load helpers, Richard Henderson, 2013/08/27