octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow
Date: Sat, 25 Nov 2017 15:04:32 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0

Follow-up Comment #42, bug #51871 (project octave):

Attached are a few more incremental versions,


speed-up-load-ascii-v6.patch  (uses istream is.get(), clear string)
speed-up-load-ascii-v6v2.patch  (uses istream is.get(), overwrite string)
speed-up-load-ascii-v7.patch  (uses istream is.read(,1))


The v6v2 version yields the same time as the v6 version, so that confirms what
you said about the string clear() function done in a fairly efficient way.  (I
actually had done this comparison before your last post.)

Speed comparisons:

current octave:  3.7360, 3.7440
octave + speed-up-load-ascii-v5.patch:  1.3640
octave + speed-up-load-ascii-v6.patch:  2.1980
octave + speed-up-load-ascii-v7.patch:  3.1040

The above also agrees with your comment about the standard getline() being
optimal.  Note above how version v5 is clearly much faster.  No surprise, as
calling a routine get() for individual characters is going to have overhead. 
But version v5 doesn't handle all EOL characters.

In version v7 I used read(,1) instead of get().  That slows down, but it is
obvious why.  Although read(,1000) is much faster than call get() 1000 times,
read(,1) has the extra overhead of a second input variable on the stack; it
has to be slower than get().

So far, then, version v6 is the benchmark of fastest while still handling all
EOL.

I thought to pursue reading in data at bigger hunks, and then search for EOL
characters.  It seems too clumsy though, so I hesitate.  I then thought to
perhaps go back to FILE * and lower level C-like I/O, but that messes far too
much with other text/matrix/etc. code which is based on istream objects.

The following reference suggests an option that seems much more
straightforward and efficient:

https://stackoverflow.com/questions/13995971/using-get-line-with-multiple-types-of-end-of-line-characters

The idea would be to create a "filter buffer stream" for which the istream is
buffer passes through.  That filter buffer stream will convert all 0x0A, 0x0D,
0x0D-0x0A characters to the native '\n' character, *then* we can use getline()
just as it is.  That is

istream is --> filter istream fs --> fs.getline()

That seems the most efficient and elegant solution, doesn't it?  At least in
principle.  I'm going to try coding that.  If it doesn't work, then version v6
it is, I guess.

(file #42485, file #42486, file #42487)
    _______________________________________________________

Additional Item Attachment:

File name: speed-up-load-ascii-v6.patch   Size:11 KB
File name: speed-up-load-ascii-v6v2.patch Size:11 KB
File name: speed-up-load-ascii-v7.patch   Size:11 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51871>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]