octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #51871] load ascii-based numeric file is signi


From: count
Subject: [Octave-bug-tracker] [bug #51871] load ascii-based numeric file is significantly slow
Date: Sun, 27 Aug 2017 14:15:44 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0

URL:
  <http://savannah.gnu.org/bugs/?51871>

                 Summary: load ascii-based numeric file is significantly slow
                 Project: GNU Octave
            Submitted by: count
            Submitted on: Sun 27 Aug 2017 06:15:42 PM UTC
                Category: Octave Function
                Severity: 3 - Normal
                Priority: 5 - Normal
              Item Group: Performance
                  Status: None
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: dev
        Operating System: GNU/Linux

    _______________________________________________________

Details:

Demo of slowness:
(All tests are been run several times to ensure file is cached.)


a = 1e40 * rand(1e6,1);   % huge numbers can make strtod() a bit slower.
save('-ascii', '-double', 'dat.txt', 'a');

tic
b = load('dat.txt');
toc


 Elapsed time is 3.3219 seconds.

But for binary read, it is much faster (x100 times).


tic
fid = fopen('dat.txt');
c = fread(fid, Inf, '*char');
fclose(fid);
toc


 Elapsed time is 0.0309799 seconds.

To be more fair, here reads the numbers using C++ fstream.


// g++ -O2 fstream_read_double.cpp && time ./a.out

#include <fstream>
#include <vector>

int main()
{
  double d;
  std::vector<double> v;
  std::ifstream fin("dat.txt");
  while (fin >> d) {
    v.push_back(d);
  }
  return (int)v.size();  // Avoid optimize out v, if any.
}


 real   0m0.477s

Still an order of magnitude faster than Octave's load().

----

After test the source code, I found that most of the time is spent in


// libinterp/ls-mat-ascii.cc

static std::string
get_mat_data_input_line (std::istream& is)


The data operations of sstream and fstream has high overhead, especially
sstream.

The C++ code piece above has this overhead for every *line* of input, but the
get_mat_data_input_line() code has this overhead for every *character*!

I'm testing a speed up of the load-ascii code, hope that moderate the
slowness.





    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51871>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]