bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-apl] Performance problems when constructing large(ish) arrays


From: Elias Mårtenson
Subject: [Bug-apl] Performance problems when constructing large(ish) arrays
Date: Wed, 18 Jan 2017 00:34:32 +0800

I wanted to use GNU APL to work on a dataset of star data. The file consists of 34030 lines of the following form:

  892376 3813 4.47 0.4699  1.532  0.007    7306.69 0.823 0.4503 0 ---
 1026146 4261 4.57 0.6472 14.891  0.12    11742.56 1.405 0.7229 0 ---
 1026474 4122 4.56 0.5914  1.569  0.006   30471.8  1.204 0.6061 0 ---
 1162635 3760 4.77 0.4497 15.678  0.019   10207.47 0.978 0.5445 1 ---


I wrote a generic CSV loader to handle this (source code at the end of this email), and loaded the data like so:

    z ← 'nnnnnnnnnns' read_csv 'apjs492452t1_mrt.txt'

This took many minutes to load, which in my opinion shouldn't happen.

Now, I have a few questions:

  1. Is there a way to speed up this code?
  2. Is there something that could be done on the GNU APL implementation side to make this faster?
  3. Shouldn't we have a generic ⎕CSV function or something like that which would be able to load CSV files in milliseconds regardless of size? This should be trivial to do in C++.
Here's the code in question:

∇Z ← type convert_entry value
  →('n'≡type)/numeric
  →('s'≡type)/string
  ⎕ES 'Illegal conversion type'
numeric:
  Z←⍎value
  →end
string:
  Z←value
end:


∇Z ← pattern read_csv filename ;fd;line;separator
  separator ← ' '
  Z ← 0 (↑⍴pattern) ⍴ ⍬
  fd ← 'r' FIO∆fopen filename

next:
  line ← FIO∆fgets fd           ⍝ Read one line from the file
  →(⍬≡line)/end
  →(10≠line[⍴line])/skip_nl     ⍝ If the line ends in a newline
  line ← line[⍳¯1+⍴line]        ⍝ Remove the newline
skip_nl:
  line ← ⎕UCS line
  Z ← Z⍪ pattern convert_entry¨ (line≠separator) ⊂ line
  →next
end:

  FIO∆fclose fd



reply via email to

[Prev in Thread] Current Thread [Next in Thread]