rapp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Rapp-dev] How-to vectorize functions in compute/generic (like rc_type_b


From: Hans-Peter Nilsson
Subject: [Rapp-dev] How-to vectorize functions in compute/generic (like rc_type_bin_to_u8 using RC_VEC_SETMASKV)
Date: Mon, 19 Mar 2012 07:39:03 +0100

There were no comments on the suggested new vector abstraction
layer macros, so I took the liberty to change my mind.  What I
proposed as RC_VEC_UNPACKB8 is now called RC_VEC_SETMASKV, to
reflect that it's practically the inversion of RC_VEC_GETMASKV.
(There were some testsuite framework bits for a macro named
RC_VEC_SETMASK, but RC_VEC_SETMASKV is a better name; I can
imagine reasons to add a RC_VEC_SETMASKW like RC_VEC_GETMASKW in
the future; but no benefit to that at present.)  I also no
longer think there should be separate binary-to- 16-bit-vector
and 32-bit-vector conversion macros, but instead just
sign-extension macros, extending from 8 to 16 and from 16 to 32.
But that's for another day.

This time, to pave the way for future similar but more
complicated vectorization efforts, I did rc_type_bin_to_u8,
adding RC_VEC_SETMASKV and generalized this into a kind of
how-to-documentation focusing on updates to the framework parts
in RAPP.  Here's how to vectorize RAPP operations, given
interest to vectorize a specific function in compute/generic,
choosing for this example rc_type_bin_to_u8.

If it isn't already tuned due to unrolling, make the function
tuned.  At a minimum, this means adding the function to the
rc_bmark_suite table in compute/tune/benchmark/rc_benchmark.c
and conditionalizing the presence of the generic function on
RC_IMPL(rc_type_bin_to_u8, 1) && RC_UNROLL(rc_type_bin_to_u8) == 1.
Alternatively, you may want to make the generic function
actually unrollable.  For the latter approach to this step, see
the commit
<http://git.savannah.gnu.org/cgit/rapp.git/commit/?id=04928fb870748c0f02509e6b23a14949319bf4d4>.
If the operation has some kind of scaling factor between the
size of inputs and outputs and it's not matching an existing
combination, a new function to prepare parameters is needed,
maybe also adding new structure members.

For *any* change this invasive, it's important to verify that no
bugs writing slightly beyond the frame were introduced.  *For
each unroll number* (and of course each back-end implementation)
run both "make check" and "env LD_LIBRARY_PATH=`pwd`/.libs
valgrind ./test/.libs/lt-rapptest".  Also, adding a function to
the rc_bmark_suite table should be followed by re-generating all
rapptune-files (in compute/tune/arch/) at least before the next
RAPP release.  It's not strictly necessary for things to work at
all, but discrepancies between the contents of the
rc_bmark_suite table and the existing rapptune-file will trigger
regeneration of the tune-files when the RAPP source code is in
development mode.  For releases, there'll be a tuning fallback
to the unrolled generic function (as always, overridable by
--enable-tune-cache/--disable-tune-cache).

When looking into how to vectore the generic function, you may
find that the existing vector abstraction layer macros
(aka. back-end macros) are insufficient and additional ones are
needed.  For this, you need to research (and get access for
testing) to common SIMD extensions and how exactly to perform
the vectorization.  Likely, there will be some back-and-forth
work switching between looking at back-end macros and the actual
function vectorization operations.  When you're happy with the
basic back-end macro (and the name :) add testing framework, API
documentation and implementations for at least two backends; two
to make sure the operation is sane.  Not both should be
mmx/sse*.  Preferably the back-ends should be of different
endianness.  For extra credits, add a SWAR implementation; this
may be simpler than expected, just a generalization of a pattern
seen in the back-end macro implementations.  The changes
required for RC_VEC_SETMASK plus a SWAR implementation are shown
in commit
<http://git.savannah.gnu.org/cgit/rapp.git/commit/?id=a3999a80a27d030579265515a9fce3be7f7bc508>.
Yet more back-end implementations are in commit
<http://git.savannah.gnu.org/cgit/rapp.git/commit/?id=bdb96deeeb6a9a3089f0020f627d2a57afa09c5c>.

The main changes are next: the vectorized function.  Don't
forget to make the presence of the vectorized function
conditional on the existence of the used back-end macros
(preferably hierarchically; for each level conditionally defined
on the existence of the actual macro used within the function /
macro) as well as unrolling.  See commit
<http://git.savannah.gnu.org/cgit/rapp.git/commit/?id=f591d29ab4b4fd3be638d828d1de2725dcdb5a52>.

Happy hacking.

brgds, H-P



reply via email to

[Prev in Thread] Current Thread [Next in Thread]