I made a test implementation of a performance optimisation for GNU
APL. The short story is that it seems to work for limited tests, but
there may be issues with my approach and I don't want to spend any
more time on it unless Jürgen agrees it's a good idea. After all, I
may have completely misunderstood the architecture of GNU APL and my
approach may be fundamentally broken.
When doing some benchmarking I noticed a lot of time was spent
allocating, initialising and freeing Value instances. I realised that
many such allocations are not actually needed. Take the following
expression for example:
,1+2×foo
In this case, the following things happen:
1. The multiplication function (implemented by eval_skalar_AB)
creates a new array and fills it with the result of the
operation (in this case, bif_multiply) on each cell.
2. The addition function is then called, resulting in an
identical flow as in the multiplication above.
3. The ravel function is then called (Bif_COMMA::ravel),
resulting in yet another array being created which is simply
copied from B. Note that the ravel is identical between Z and
B in this case.
From a performance perspective, I saw a particular low-hanging fruit
in the ravel call in the last point. Since the value that came from
the addition is never used anywhere else, one can simply change the
shape of B in-place and return it immediately.
My experiment involved creating a new VF (value flag) value; VF_temp
that indicates that the value may be reused. This flag is then set for
all values that are only used once (for example, the return value from
a primitive or user function). If a value needs to be preserved, it is
copied, and the flag is not set.
What all of this means is that values that are returned from a
function, can be reused if it makes sense. There are several cases
where this is the case. For example:
* The comma function (as discussed above)
* Reshape, especially when the shape of Z matches that of B
* Scalar functions. If the sizes of the arguments of plus, for
example, matches, then A or B can be reused if they are marked
as temporary.
* The each operator can (maybe?) push the result straight into
the source array.
I've only done very basic testing, but the ravel operator on a large
array went from taking a second or so to pretty much zero. Same should
be true for reshape, although that one crashes for me right now. :-)
Scalar functions still have to process the elements, so their timing
can't go to zero, but at least the should save the time spent setting
up the result array and creating new values.
Please let me know what you think.
Elias