Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementatio

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementatio

From:	David Gibson
Subject:	Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation
Date:	Thu, 29 Sep 2016 13:55:33 +1000
User-agent:	Mutt/1.7.0 (2016-08-17)

On Thu, Sep 29, 2016 at 09:11:10AM +0530, Nikunj A Dadhania wrote:
> David Gibson <address@hidden> writes:
> 
> > [ Unknown signature status ]
> > On Wed, Sep 28, 2016 at 11:01:22AM +0530, Nikunj A Dadhania wrote:
> >> Load 8byte at a time and manipulate.
> >> 
> >> Big-Endian Storage
> >> +-------------+-------------+-------------+-------------+
> >> | 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF |
> >> +-------------+-------------+-------------+-------------+
> >> 
> >> Little-Endian Storage
> >> +-------------+-------------+-------------+-------------+
> >> | 33 22 11 00 | 77 66 55 44 | BB AA 99 88 | FF EE DD CC |
> >> +-------------+-------------+-------------+-------------+
> >> 
> >> Vector load results in:
> >> +-------------+-------------+-------------+-------------+
> >> | 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF |
> >> +-------------+-------------+-------------+-------------+
> >
> > Ok.  I'm guessing from this that implementing those GPR<->VSR
> > instructions showed that the earlier versions were endian-incorrect as
> > I suspected.
> >
> > Have you verified that this new implementation is actually faster (or
> > at least no slower) on LE than the original implementation with
> > individual 32-bit stores?
> 
> Result of million lxvw4x, mfvsrd/mfvsrld and print
> 
> Without patch:
> ==============
> [tcg_test]$ time ../qemu/ppc64le-linux-user/qemu-ppc64le  -cpu POWER9 
> le_lxvw4x  >/dev/null
> real  0m2.812s
> user  0m2.792s
> sys   0m0.020s
> [tcg_test]$
> 
> With patch:
> ===========
> [tcg_test]$ time ../qemu/ppc64le-linux-user/qemu-ppc64le  -cpu POWER9 
> le_lxvw4x  >/dev/null
> real  0m2.801s
> user  0m2.783s
> sys   0m0.018s
> [tcg_test]$
> 
> Not much perceivable difference, is there a better way to benchmark?

Not dramatically, that I can think of.  A few tweaks you can make:
    * Increase the loop counter so the test simply runs for longer
    * Also run the test multiple times, so you can get an idea of how
      much the results vary from one run to another
    * Run the test on a system that's as idle of other activity as you
      can make it (at both host and guest level).

For out purposes the user time is probably the meaningful thing here,
and should show less variance than the system and real time.

Note that it would be interesting to get these results for both a
power and x86 host.

In any case the results above are enough to convince me that the
change isn't likely to be a significant regression.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v4 2/9] target-ppc: Implement mtvsrdd instruction, (continued)
- [Qemu-devel] [PATCH v4 1/9] target-ppc: Implement mfvsrld instruction, Nikunj A Dadhania, 2016/09/28
  - Re: [Qemu-devel] [PATCH v4 1/9] target-ppc: Implement mfvsrld instruction, Richard Henderson, 2016/09/28
- [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/28
  - Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation, Richard Henderson, 2016/09/28
  - Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/28
    - Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/28
    - Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/28
    - Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation, Richard Henderson, 2016/09/28
    - Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/29
    - Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation, David Gibson <=
- [Qemu-devel] [PATCH v4 3/9] target-ppc: Implement mtvsrws instruction, Nikunj A Dadhania, 2016/09/28
  - Re: [Qemu-devel] [PATCH v4 3/9] target-ppc: Implement mtvsrws instruction, Richard Henderson, 2016/09/28
- [Qemu-devel] [PATCH v4 5/9] target-ppc: improve stxvw4x implementation, Nikunj A Dadhania, 2016/09/28
  - Re: [Qemu-devel] [PATCH v4 5/9] target-ppc: improve stxvw4x implementation, Richard Henderson, 2016/09/28
- [Qemu-devel] [PATCH v4 8/9] target-ppc: add lxvb16x instruction, Nikunj A Dadhania, 2016/09/28
  - Re: [Qemu-devel] [PATCH v4 8/9] target-ppc: add lxvb16x instruction, Richard Henderson, 2016/09/28
- [Qemu-devel] [PATCH v4 7/9] target-ppc: add stxvh8x instruction, Nikunj A Dadhania, 2016/09/28
  - Re: [Qemu-devel] [PATCH v4 7/9] target-ppc: add stxvh8x instruction, Richard Henderson, 2016/09/28
- [Qemu-devel] [PATCH v4 9/9] target-ppc: add stxvb16x instruction, Nikunj A Dadhania, 2016/09/28
  - Re: [Qemu-devel] [PATCH v4 9/9] target-ppc: add stxvb16x instruction, Richard Henderson, 2016/09/28

Prev by Date: Re: [Qemu-devel] [PATCH v6] target-ppc: Implement mtvsrws instruction
Next by Date: Re: [Qemu-devel] [PATCH v6] target-ppc: Implement mtvsrws instruction
Previous by thread: Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation
Next by thread: [Qemu-devel] [PATCH v4 3/9] target-ppc: Implement mtvsrws instruction
Index(es):
- Date
- Thread