[Bug-apl] Performance problems when constructing large(ish) arrays

From:

Elias Mårtenson

Subject:

Date:

Wed, 18 Jan 2017 00:34:32 +0800

I wanted to use GNU APL to work on a dataset of star data. The file consists of 34030 lines of the following form:

892376 3813 4.47 0.4699 1.532 0.007    7306.69 0.823 0.4503 0 ---
1026146 4261 4.57 0.6472 14.891 0.12    11742.56 1.405 0.7229 0 ---
1026474 4122 4.56 0.5914 1.569 0.006   30471.8 1.204 0.6061 0 ---
1162635 3760 4.77 0.4497 15.678 0.019   10207.47 0.978 0.5445 1 ---

I wrote a generic CSV loader to handle this (source code at the end of this email), and loaded the data like so:

z ← 'nnnnnnnnnns' read_csv 'apjs492452t1_mrt.txt'

This took many minutes to load, which in my opinion shouldn't happen.

Now, I have a few questions:

Is there a way to speed up this code?
Is there something that could be done on the GNU APL implementation side to make this faster?
Shouldn't we have a generic ⎕CSV function or something like that which would be able to load CSV files in milliseconds regardless of size? This should be trivial to do in C++.

Here's the code in question:

∇Z ← type convert_entry value
→('n'≡type)/numeric
→('s'≡type)/string
⎕ES 'Illegal conversion type'
numeric:
Z←⍎value
→end
string:
Z←value
end:
∇

∇Z ← pattern read_csv filename ;fd;line;separator
separator ← ' '
Z ← 0 (↑⍴pattern) ⍴ ⍬
fd ← 'r' FIO∆fopen filename

next:
line ← FIO∆fgets fd           ⍝ Read one line from the file
→(⍬≡line)/end
→(10≠line[⍴line])/skip_nl     ⍝ If the line ends in a newline
line ← line[⍳¯1+⍴line]        ⍝ Remove the newline
skip_nl:
line ← ⎕UCS line
Z ← Z⍪ pattern convert_entry¨ (line≠separator) ⊂ line
→next
end:

FIO∆fclose fd
∇

[Prev in Thread]

Current Thread

[Next in Thread]