I wanted to use GNU APL to work on a dataset of star data. The file consists of 34030 lines of the following form:
892376 3813 4.47 0.4699 1.532 0.007 7306.69 0.823 0.4503 0 ---
1026146 4261 4.57 0.6472 14.891 0.12 11742.56 1.405 0.7229 0 ---
1026474 4122 4.56 0.5914 1.569 0.006 30471.8 1.204 0.6061 0 ---
1162635 3760 4.77 0.4497 15.678 0.019 10207.47 0.978 0.5445 1 ---
I wrote a generic CSV loader to handle this (source code at the end of this email), and loaded the data like so:
z ← 'nnnnnnnnnns' read_csv 'apjs492452t1_mrt.txt'This took many minutes to load, which in my opinion shouldn't happen.
Now, I have a few questions:
- Is there a way to speed up this code?
- Is there something that could be done on the GNU APL implementation side to make this faster?
- Shouldn't we have a generic ⎕CSV function or something like that which would be able to load CSV files in milliseconds regardless of size? This should be trivial to do in C++.
Here's the code in question:
∇Z ← type convert_entry value
→('n'≡type)/numeric
→('s'≡type)/string
⎕ES 'Illegal conversion type'
numeric:
Z←⍎value
→end
string:
Z←value
end:
∇
∇Z ← pattern read_csv filename ;fd;line;separator
separator ← ' '
Z ← 0 (↑⍴pattern) ⍴ ⍬
fd ← 'r' FIO∆fopen filename
next:
line ← FIO∆fgets fd ⍝ Read one line from the file
→(⍬≡line)/end
→(10≠line[⍴line])/skip_nl ⍝ If the line ends in a newline
line ← line[⍳¯1+⍴line] ⍝ Remove the newline
skip_nl:
line ← ⎕UCS line
Z ← Z⍪ pattern convert_entry¨ (line≠separator) ⊂ line
→next
end:
FIO∆fclose fd
∇