qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/6] ARM: AREG0 conversion


From: Laurent Desnogues
Subject: Re: [Qemu-devel] [PATCH v2 0/6] ARM: AREG0 conversion
Date: Thu, 29 Mar 2012 17:42:02 +0200

On Tue, Mar 27, 2012 at 9:59 PM, Artyom Tarasenko <address@hidden> wrote:
> On Tue, Mar 27, 2012 at 7:01 PM, Laurent Desnogues
> <address@hidden> wrote:
>> On Tue, Mar 27, 2012 at 6:48 PM, Blue Swirl <address@hidden> wrote:
>>> On Tue, Mar 27, 2012 at 13:40, Laurent Desnogues
>>> <address@hidden> wrote:
>>>> On Mon, Mar 26, 2012 at 7:02 PM, Blue Swirl <address@hidden> wrote:
>>>> [...]
>>>>> At least stack protector is protecting more code than before (for
>>>>> example TLB miss handler), but could overhead from that amount to 5%?
>>>>>
>>>>> Otherwise there should be just a few extra register moves here and
>>>>> there, that should be cheap on modern processors.
>>>>
>>>> The extra moves might be cheap but their cost is obviously not 0:
>>>> on top of using extra CPU core resources, code size is increased
>>>> which results in more instruction cache misses.
>>>>
>>>> I didn't like the idea when we discussed it back in May, now it
>>>> looks like we have concrete evidence the speed impact is
>>>> measurable (though I'd like some more numbers than the rough
>>>> 5% estimate I gave).
>>>
>>> A clearly defined test case running on a host that does not adjust
>>> clock frequencies would be nice. It would be interesting to find out
>>> where exactly the slowdown comes from.
>>>
>>> Perhaps the access helpers ({helper,_}_{ld,st}{b,w,l}_mmu) generated
>>> by softmmu_template.h are the culprit. If so, they could be split from
>>> other code and moved to TCG back ends. That way the interface could be
>>> improved while keeping all other cleanups.
>>
>> I also get a slowdown running in user mode, so I don't think
>> improving the mmu ld/st will completely remove the issue.
>> In that case the slowdown comes from the extra move
>> instructions for helper calls.  The ARM target uses way too
>> many helpers, but that's another discussion :-)
>>
>
> Have you tried compiling without -fstack-protector-all as suggested by Lluís?
> I observe a similar slowdown on a sparc target, and there compiling
> without stack protection definitely helps.

That will indeed probably make the real problem, which is that
this patch increases the size of generated code, less obvious
on small benchmarks that don't put pressure on instruction
cache.  But the fact is that generated code is larger and will
have to execute more instructions, so no matter what you do,
this will have an impact on speed.


Laurent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]