[Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow

octave-bug-tracker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow

From:	Dan Sebald
Subject:	[Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow
Date:	Tue, 29 Aug 2017 22:22:22 -0400 (EDT)
User-agent:	Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0

Follow-up Comment #12, bug #51871 (project octave):

OK, I'm looking just at the patch.  This hunk


+      std::getline (is, retval);
+      
+      // Remove tailing '\r'.
+      while (retval.size () && retval.back () == '\r')
+        retval.pop_back ();
+      
+      // Remove any comment.
+      size_t pos_comment = retval.find_first_of ("#%");
+      if (pos_comment != std::string::npos)
+        retval.erase (pos_comment);
+      
+      // Detect non-whitespace.
+      no_data_found = (retval.find_first_not_of (" \t") ==
std::string::npos);


seems rather wasteful in the sense that it is scanning a whole, possibly long
line of ASCII characters multiple times to find what most likely is nothing. 
It is like tripling the amount of scanning effort.  Does the comment character
need to be the first character in a line for the line to be officially a
comment line?  Why search through the whole line if numeric characters appear
early in the line?

Generally for that main loop that calls this routine, couldn't this strategy
be changed so that the priority is to scan the line for floats and if the scan
fails then figure out what went wrong?  It would require a bit more processing
mixed in with the loop, but it would be three or four times as efficient and
more along the lines of the simpler functions that Count used.

For example, if the first float sscanf fails, then check if the first
character is a "%#", if so call the line a comment and skip.  With that
approach, does the tailing \r matter anymore?

I'd say, try figuring out what is wrong with dlmread.cc then commit the most
recent changeset.  After that, try revamping this routine so that the testing
is more integrated in the looping and rather than two functions

get_lines_and_columns ()
get_mat_data_input_line ()

combine their functionality into one and use no function calls.  Instead, if
one doesn't want to use efficient array expansion, use something like
(psuedo-code):


for (int i_scan = 0; i_scan < 2; i_scan++) {
    LOOP_TRHOUGH_ALL_LINES {
      if (i_scan == 1)
          tmp.elem (i, j) = d;
    }
    if (i_scan == 1)
        break;
    ASSIGN_MATRIX_MEMORY; 
}


Does that sound like a more efficient approach?

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51871>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[Octave-bug-tracker] [bug #51871] load ascii-based numeric file is significantly slow, (continued)

Prev by Date: [Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow
Next by Date: [Octave-bug-tracker] [bug #51830] uname nodename sometimes gives "localhost" instead of pc name
Previous by thread: [Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow
Next by thread: [Octave-bug-tracker] [bug #51871] loading '-ascii' format files is slow
Index(es):
- Date
- Thread