octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #50619] textscan weird behaviour when reading


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #50619] textscan weird behaviour when reading a csv
Date: Sat, 25 Mar 2017 14:46:26 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0

Follow-up Comment #11, bug #50619 (project octave):

Yes, that is a duplicate.

OK, I'm a little further along in the thought process, and I see now why this
strange behavior and the formula for "he [deg]" works, but not "heading [deg]"
does not work.

The delimiter buffer size is 80.  Count 80 characters out of the below
buffer:


time [s];lat [deg];lon [deg];x [m];y [m];speed [m/s];heading [deg]
5.2500000000000;44.000000000000000;10.000000000000000;0.000000000000000;0.000000000000000;0.000000000000000;44.998483574087999


and that points 4 characters BEFORE the semicolon of the second line.  Snip
six characters from the first line and then that first semicolon of the second
line is AFTER 80 characters.  So, the interaction of that delimited_stream
with the end of its buffer and putting stuff back into buffer is where the
error lies.  And that is why when I put ";\n" in for the delimiter characters,
the fields come out right, but the "5.25" is dropped--the delimiter_stream
buffer has grabbed a new chunk of data from the std::stream, so what the
delimiter stream is attempting to put back, is lost. (? That's the theory
anyway.)

By that contorted thinking, lengthening the delimiter_stream buffer from 80 to
100 should fix this particular problem when I use ";\n" delimiters...


    // Next, choose a buffer size to avoid reading too much, or too often.
    octave_idx_type buf_size = 4096;
    if (buffer_size)
      buf_size = buffer_size;
    else if (ntimes > 0)
      {
        // Avoid overflow of 80*ntimes...
//        buf_size = std::min (buf_size, std::max (ntimes, 80 * ntimes));
        buf_size = std::min (buf_size, std::max (ntimes, 100 * ntimes));
        buf_size = std::max (buf_size, ntimes);
      }
    // Finally, create the stream.
    delimited_stream is (isp,
                         (delim_table.empty () ? whitespace + "\r\n" :
delims),
                         max_lookahead, buf_size);


And that does, in fact, work:


octave:8> logLine
logLine = 
{
  [1,1] =  5.2500
  [1,2] =  44
  [1,3] =  10
  [1,4] = 0
  [1,5] = 0
  [1,6] = 0
  [1,7] =  44.998
}


but of course this isn't a general fix, because the first line could be any
length.

OK, so there are two things wrong in the delimiter_stream code

1) The EOL character is not automatically included as a delimiter.  I guess it
should be in all cases, correct?  That is, there isn't some form of syntax for
textscan() for which the user can specify EOL is not a delimiter?

2) The buffer doesn't behave correctly at the end, most likely because
valuable characters are dropped when delimiter_stream buffer does a
refresh_buf():


    void field_done (void)
    {
      if (idx >= last)
        refresh_buf ();
    }



    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?50619>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]