bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Performance problems when constructing large(ish) arrays


From: Elias Mårtenson
Subject: Re: [Bug-apl] Performance problems when constructing large(ish) arrays
Date: Wed, 18 Jan 2017 18:17:16 +0800

You've all made good points, and I changed the code slightly to provide the initial array side in order to avoid the recreation of the array on each iteration. This brought down the loading time to a much more bearable 14 seconds. I rewrote the Lisp code to be compatible with the APL code and the time was 1.46 seconds. This suggests that GNU APL is consistently about 10 times slower than non-optimised Lisp code. To me, this is not unexpected given the fact that GNU APL isn't designed to be high-performance.

However, while 14 seconds for 30k is manageable, I have had the need to work with arrays of over a million rows. Extrapolating this suggests that it would take almost 8 minutes to load such a file. Thus, unless GNU APL can magically improve overall performance by at least 10 times, I still think we need a native CSV loading function.

Regards,
Elias

For reference, here is the APL code:

∇Z ← type convert_entry value
  →('n'≡type)/numeric
  →('s'≡type)/string
  ⎕ES 'Illegal conversion type'
numeric:
  Z←⍎value
  →end
string:
  Z←value
end:


∇Z ← pattern read_csv_n[n] filename ;fd;line;separator;i
  separator ← ' '
  Z ← n (↑⍴pattern) ⍴ 0
  fd ← 'r' FIO∆fopen filename
  i ← ⎕IO

next:
  line ← FIO∆fgets fd           ⍝ Read one line from the file
  →(⍬≡line)/end
  →(10≠line[⍴line])/skip_nl     ⍝ If the line ends in a newline
  line ← line[⍳¯1+⍴line]        ⍝ Remove the newline
skip_nl:
  line ← ⎕UCS line
  Z[i;] ← pattern convert_entry¨ (line≠separator) ⊂ line
  i ← i+1
  →next
end:

  FIO∆fclose fd


And here is the Lisp code (the test case was running on SBCL), requires the QL packages SPLIT-SEQUENCE and PARSE-NUMBER:

(defparameter *result*
           (time
            (with-open-file (s "apjs492452t1_mrt.txt")
              (let ((res (make-array '(34030 11))))
                (dotimes (i (array-dimension res 0))
                  (let* ((line (read-line s))
                         (parts (split-sequence:split-sequence #\Space line :remove-empty-subseqs t)))
                    (loop
                      for ii from 0 below 10
                      for p in parts
                      do (setf (aref res i ii) (parse-number:parse-number p)))
                    (setf (aref res i 10) (nth 10 parts))))
                res))))


On 18 January 2017 at 09:57, Blake McBride <address@hidden> wrote:
On Tue, Jan 17, 2017 at 7:39 PM, Xiao-Yong Jin <address@hidden> wrote:
I always feel GNU APL kind of slow compared to Dyalog, but I never really compared two in large dataset.
I'm mostly using J now for large dataset.
If Elias has the optimized code for GNU APL and a reproducible way to measure timing, I'd like to compare it with Dyalog and J.

I think that's actually a good idea.  It would be a good comparison.  It would really make it clear if there is a blaring problem.  But first the APL code should be optimized a bit (but nothing crazy like reading it all into memory right now.)

--blake


 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]