qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
Date: Wed, 26 Jan 2011 10:23:04 +0100

On 26.01.2011, at 09:53, Edgar E. Iglesias wrote:

> On Tue, Jan 25, 2011 at 02:07:15PM -0800, Richard Henderson wrote:
>> On 01/25/2011 08:48 AM, Edgar E. Iglesias wrote:
>>> OK, I see. Maybe we should try to emit an insn sequence more similar
>>> to what tcg was emitting (for the non 8 & 16-bit deposits)?
>>> That ought too at least give similar results as before for those and
>>> give us a speedup for the byte and word moves.
>> 
>> Please try this replacement version for tcg/i386/*.
>> 
>> I was able to run your cris testsuite, and stuff looked ok there.
>> But for some reason the microblaze kernel would not boot.  It seems
>> that the kernel commandline isn't in place properly and it isn't 
>> finding the disk image.
> 
> Yes, you need to build qemu with libfdt for that image to boot.
> IIRC, libfdt comes with the dtc package on some dists. In the future
> I'll see if can upload an image that boots without the need of
> devicetree manipulation in qemu.
> 
> I tried your new patch and got similar results as before. Maaybe slightly
> faster but within the noise.
> 
> I looked a little bit more at it and realized that I'm probably not doing
> a fair comparition with microblaze. The write_carry sequence actually
> bit deposits a bit into two locations with a sequence of tcg ops that
> is not much longer than the one to deposit a single bit. So I'm basically
> comparing the cost of a single tcg deposit sequence with the cost of two
> tcg deposit backend ops. I should probably just accept that the new
> deposit op is not worth using for that particular case.
> 
> It would be nice if somebody else also tested this patch. Before we
> agree on applying it.
> 
> One note, the tcg_scratch_alloc hunk from the previous version was
> missing on this one.

I'm using the deposit instruction on S/390 with the implementation patch for 
x86 where it appears to not improve performance:

Booting into initrd shell:

3.53s  w/ x86 deposit patch 
3.52s  w/o x86 deposit patch


But maybe this is the same problem as with the shift opcode that was reported 
earlier in the thread:

address@hidden:/studio/s390/qemu-s390> grep deposit target-s390x/translate.c 
    tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 32);
    tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 32);
    tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 16);
    tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 8);
        tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 48, 16);
        tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 32, 16);

The 0, 32 and 0, 16 versions should get accelerated pretty well while the 32, 
16 and 48, 16 are not I assume?

I'm not saying that the deposit instruction doesn't make sense btw. Code 
cleanup wise it was awesome - 10 lines of tcg combined into a single call :). 
Maybe a simple andi and ori emission for unknown masks might make more sense 
though?


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]