Hi,
I can confirm that something is strange with the macro based ¨ but I can't yet
see what.
I wrote a simple measurement function to test *A+B* in different ways:
*A+B (direct scalar)**
**A+¨B (uses built-in each)
**A {⍺+⍵} B (uses macro each**)**
*
The measurement program is this:
* )clear**
**∇MEASURE B;A;D;R;T**
** A←⍳B ◊ T←7⍴0**
** T[1]←⎕FIO ¯1 ◊ ⊣ A + A ◊ T[2]←⎕FIO ¯1 ◊ ⊣ A + A**
** T[3]←⎕FIO ¯1 ◊ ⊣ A +¨ A ◊ T[4]←⎕FIO ¯1 ◊ ⊣ A +¨ A**
** T[5]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}¨ A ◊ T[6]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}¨ A**
** T[7]←⎕FIO ¯1 ◊ D←T-1⌽T ◊ R←¯1↓D÷D[3]**
** ⍉2 6⍴(,2/'A + B' 'A +¨ B' 'A {⍺+⍵}¨ B') , (⊂8 2)⍕¨R**
**∇**
**∇Z←A (LO EACH_1) B;rho_Z;N;N_max**
**rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0**
**LOOP: Z[N]←⊂(⊃A[N]) LO ⊃B[N] ◊ →(N_max>N←N+1)⍴LOOP**
**Z←rho_Z⍴Z**
**∇**
**∇Z←A (LO EACH_2) B;rho_Z;N;N_max**
**rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0**
**LOOP: (N⌷Z)←⊂(N⌷A) LO N⌷B ◊ →(N_max>N←N+1)⍴LOOP**
**∇*
Every case is measured twice to see fluctuations, caching and the like.
The numbers are CPU cycles relative to the third line
The results are quite puzzling:
* MEASURE 100**
** A + B .18 **
** A + B .14 **
** A +¨ B 1.00 **
** A +¨ B .90 **
** A {⍺+⍵}¨ B 17.81 **
** A {⍺+⍵}¨ B 11.81 **
**
**
** MEASURE 1000**
** A + B .13 **
** A + B .12 **
** A +¨ B 1.00 **
** A +¨ B 1.01 **
** A {⍺+⍵}¨ B 18.03 **
** A {⍺+⍵}¨ B 11.67 **
**
** MEASURE 10000**
** A + B .18 **
** A + B .17 **
** A +¨ B 1.00 **
** A +¨ B .29 **
** A {⍺+⍵}¨ B 111.45 **
** A {⍺+⍵}¨ B 111.85 **
*
So the macro based EACH becomes slower when the values grow. But why?
The macro code for the EACH macro is quite simple (and ⎕FXing it gives the same
behavior as the built-in macro.
* ⎕CR 'μ-Z__vA_LO_EACH_vB'**
**Z←A (LO Z__vA_LO_EACH_vB) B;rho_Z;N;N_max**
**rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0**
**LOOP: Z[N]←⊂(⊃A[N]) LO ⊃B[N] ◊ →(N_max>N←N+1)⍴LOOP**
**Z←rho_Z⍴Z*
I can't see anything in that code which would explain why the macro performance
drops when the values get larger. The difference between *A +¨ B *and*A {⍺+⍵}¨
B* is caused
by two independent factors:
one uses the built-in + while the other uses the user defined lambda {⍺+⍵}, and
one uses the built-in ¨ while the other uses the user defined macro
Z__vA_LO_EACH_vB*
*
The impact of the second factor can be determined by running the same
measurement with an
older version of GNU APL:
* MEASURE 100 **
** A + B .20 **
** A + B .15 **
** A +¨ B 1.00 **
** A +¨ B .94 **
** A {⍺+⍵}¨ B 5.53 **
** A {⍺+⍵}¨ B 5.36 **
**
** MEASURE 1000**
** A + B .14 **
** A + B .10 **
** A +¨ B 1.00 **
** A +¨ B .98 **
** A {⍺+⍵}¨ B 4.87 **
** A {⍺+⍵}¨ B 4.49 **
**
** MEASURE 10000**
** A + B .26 **
** A + B .25 **
** A +¨ B 1.00 **
** A +¨ B .42 **
** A {⍺+⍵}¨ B 2.42 **
** A {⍺+⍵}¨ B 2.41 *
Here the time decreases rather than increases with the vector lengths.
Thus macros are about 3 times slower at vectors of length 100 and 30 times
slower at vectors of length 10000.
And the LOOP line in the macro does the same as the corresponding C code in the
built-in EACH case.
I have no idea yet what is going on here. The memory consumption is probably a
hint,
but I can't see where the memory goes. The A, B and Z variables are allocated
only once
per macro call and the variables in the loop are small (which means that GNU
APL is
recycling them internally rather than malloc()ing them).
Any ideas welcome!
/// Jürgen
On 06/17/2016 12:46 AM, Xiao-Yong Jin wrote:
Hi,
In my script I was using ⍎¨ on a vector of character vector from each line of a
file. After some testing, I guess it is not really the ⍎, that is slow. The
time scaling is strange.
In the following, the first row and the third row (using ¨ and ⍣ respectively)
scales like n*3, while the problem size is only n*2. My script ends up taking
more than a minute to read a 1000×1000 numeric matrix from a text file. On the
other hand, the naive {⍵+1⊣⍎v}⍣n 1 indeed scales like n*2.
In addition, gnu apl is taking nearly a gigabytes of memory to run the
following code, while dyalog is on the order of megabytes.
te¨200×⍳3
⍎¨u 536 ⍎¨u 4129 ⍎¨u 13962
⍎ v 0 ⍎ v 0 ⍎ v 0
⍎u⍣ 518 ⍎u⍣ 4114 ⍎u⍣ 13889
⍎v⍣ 15 ⍎v⍣ 58 ⍎v⍣ 131
∇te[⎕]∇
∇
[0] r←te n;s;u;v;t
[1] s←1E¯18×?n n⍴1E18
[2] u←⍕¨⊂[2]s
[3] v←↑u
[4] t←⎕AI◊⊣⍎¨u◊r←(⊂'⍎¨u'),2⌷⎕AI-t
[5] t←⎕AI◊⊣⍎v◊r←r,[.5](⊂'⍎ v'),2⌷⎕AI-t
[6] t←⎕AI◊⊣{⍵+1⊣⍎⊃u[⍵]}⍣n 1◊r←r,[1](⊂'⍎u⍣'),2⌷⎕AI-t
[7] t←⎕AI◊⊣{⍵+1⊣⍎v}⍣n 1◊r←r,[1](⊂'⍎v⍣'),2⌷⎕AI-t
∇
Best,
Xiao-Yong