|
From: | David Bateman |
Subject: | Re: more permute optimizations |
Date: | Wed, 25 Mar 2009 19:20:20 +0100 |
User-agent: | Mozilla-Thunderbird 2.0.0.17 (X11/20081018) |
Jaroslav Hajek wrote:
Unless David (or anyone else) objects, I'll replace the transpose implementation by a call to the routine using pointers. I also tinkered with the block size a bit, but it seems to me that 8 was a good guess from David. The best block size seems to depend on the problem dimensions, and I bet that dependence is nontrivial.
If its faster I have no objections.. The 8x8 block size arises as a typical cache typically has 16 independent lines, and not all are available available as some will be taken by your code.. Each line of the block takes a cache line and so 8x8 is the largest reasonable size before cache misses start happening within the copying of the block itself..
D. -- David Bateman address@hidden 35 rue Gambetta +33 1 46 04 02 18 (Home) 92100 Boulogne-Billancourt FRANCE +33 6 72 01 06 33 (Mob)
[Prev in Thread] | Current Thread | [Next in Thread] |