octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow
Date: Fri, 1 Sep 2017 14:42:35 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0

Follow-up Comment #27, bug #51871 (project octave):

Just some thoughts on this after experimenting with some code mods.  Using a
*very* rough estimate of the example script taking 4 seconds,

1) Removing the get_lines_and_columns() and replacing with manual nr=1e6, nc=1
reduces the time to 3 seconds.

2) Having

          std::string buf = get_mat_data_input_line (is);

in the loop adds 0.5 seconds.  (I.e., I just comment out the above line and
set buf = "1.234" before the loop.)

3) I used

          d = ::atof(buf.c_str());

to scan the float, and its contribution seems imperceptible.  But I don't
think this scan

 d = octave_read_value<double> (tmp_stream);

takes too long either.  Hence, the actual scanning of the data to floats is
not a bottleneck.

4) If I take all the looping out of the read_mat_ascii_data() by setting NR to
1, there is still 0.9 seconds consumed CPU.

5) The creation of the 1e6 x 1 Matrix doesn't seem to take much time.

6) That leaves about 1 to 1.5 seconds associated with this

          std::istringstream tmp_stream (buf);

Avoid using such a construct.

7) Why the 0.9 seconds then?  Well, if I reduce the contents of the dat.txt
file from 1e6 lines to just 1 line, the time goes to 0.00718498 seconds.  So,
it is actually this line in load-save.cc:

            {
              std::ifstream file (fname.c_str (), mode);

              if (! file)
                error ("load: unable to open input file '%s'",
                       orig_fname.c_str ());

that is contributing a big amount of time.  Does that make sense?  The process
of opening a large file or creating a stream from a large file takes
considerable time?  My experience doesn't suggest that just opening a file in
C doesn't take much CPU.

Here is a link to general comments about streams:

https://stackoverflow.com/questions/26095160/why-are-stdfstreams-so-slow

8) Also, having sped up the load(), I notice that save() prior to the tic/toc
takes a long time.  Does save() have the same issue, i.e., using streams is
slow?  I would think save() is much faster because there is no testing for
comments, delimiters, etc.

In conclusion, it seems to me that the use of streams and strings adds
considerable overhead in all facets--they are inherently slower and in some
ways their convenience becomes a hindrance in fundamental applications,
include the '\r' not being handled.  I would think writing code with more
basic and C-like constructs wouldn't be real difficult.  That would give the
flexibility to check for '\r' new lines as well.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51871>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]