octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #52892] textread incorrectly reads a text file


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #52892] textread incorrectly reads a text file when empty lines are present
Date: Mon, 15 Jan 2018 03:21:46 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0

Follow-up Comment #1, bug #52892 (project octave):

There has been quite a bit of activity concerning textscan, textread, etc.  As
you mentioned, there is an internal C++ routine that is used by script files. 
I forget exactly which is which, but some bug reports are here:

https://savannah.gnu.org/bugs/index.php?52550
https://savannah.gnu.org/bugs/index.php?52479

However, this testread/scan is one voluminous piece of code to account for so
many different scenarios, and there is a good chance that more adjustments
need to be done.

I have the latest development code and can test your demos here.  You mention
snippet, so I'm assuming you manually removed a big chunk of the output
because I see many more lines...so I too will snip the results to the first
ten:  I see


octave:1>
[a,b]=textread('cruise_params_with_empty_lines.cfg','%s%s','Delimiter','=','CommentStyle','#');
octave:2> a
a =
{
  [1,1] = 
  [2,1] = .
  [3,1] = AB1705
  [4,1] = 
  [5,1] = 
  [6,1] = 2017
  [7,1] = 0
  [8,1] = 1
  [9,1] = 0
  [10,1] = psc
***SNIP***
  [123,1] = nan
}

octave:3> b
b =
{
  [1,1] = working_directory
  [2,1] = cruise_id
  [3,1] = cruise_id_prefix
  [4,1] = cruise_id_suffix
  [5,1] = correct_year
  [6,1] = use_mat_for_nav
  [7,1] = make_nav
  [8,1] = use_sadcp
  [9,1] = print_formats
  [10,1] = remove_zctd_downcast
***SNIP***
  [22,1] = position_fixed
 ESCOD
{
  [1,1] = working_directory
  [2,1] = cruise_id
  [3,1] = cruise_id_prefix
  [4,1] = cruise_id_suffix
  [5,1] = correct_year
  [6,1] = use_mat_for_nav
  [7,1] = make_nav
  [8,1] = use_sadcp
  [9,1] = print_formats
***SNIP***
  [122,1] = beam2earth_bad_down_beam
  [123,1] = 
}



octave:4>
[a,b]=textread('cruise_params_no_empty_lines.cfg','%s%s','Delimiter','=','CommentStyle','#');
error: str(0): subscripts must be either integers 1 to (2^63)-1 or logicals
error: called from
    strread at line 446 column 5
    textread at line 249 column 31


You mentioned that the only difference between these files is the blank lines.
 However, when I do a diff comparison ignoring white space, I see the
following differences:


linux@ ~/octave/bug/52892 $ diff cruise_params_no_empty_lines.cfg
cruise_params_with_empty_lines.cfg -wu
--- cruise_params_no_empty_lines.cfg    2018-01-15 01:11:56.643370310 -0600
+++ cruise_params_with_empty_lines.cfg  2018-01-15 01:11:36.559370114 -0600
@@ -4,12 +4,13 @@
 # Any line that starts with a "#" will be ignored. Don't add comments #
 # after a variable because this can mess up the parsing of this file  #
 # in some versions of Matlab and Octave.                              #
-# Using the "percent" percent symbol in comments on a line before a line   
#
+# Using the "%" percent symbol in comments on a line before a line    #
 # with a variable to read can cause that variable to be ignored.      #
-# Avoid  using "percent"                                                   
#
+# Avoid  using "%"                                                    #
 #                                                                     #
 #                                                                     #
 #######################################################################
+
 #######################################################################
 # to process the cast in the current directory, set this variable to  #
 # "." without quotes. All the paths in the script  should be          #
@@ -724,3 +725,4 @@
 beam2earth_bad_down_beam=nan
 #                                                                     #
 #######################################################################
+


Is there an ancillary bug here that you ran across?

The reason that this is being shifted is that there is one empty line added
that is being read as though it was an entry.  Take a look at a[1,1]; it's
blank.  So the first non-blank, non-comment item ends up in b[1,1].  Perhaps
that is wrong behavior; don't know.  I've tried adding the option
...,"whitespace","\n") and that seems to have no effect.  The documentation
indicates:


octave:24> help strread
'strread' is a function from the file
/home/sebald/octave/octave/octave/scripts/io/strread.m

 -- [A, ...] = strread (STR)
 -- [A, ...] = strread (STR, FORMAT)
 -- [A, ...] = strread (STR, FORMAT, FORMAT_REPEAT)
 -- [A, ...] = strread (STR, FORMAT, PROP1, VALUE1, ...)
 -- [A, ...] = strread (STR, FORMAT, FORMAT_REPEAT, PROP1, VALUE1,
          ...)
     Read data from a string.
***SNIP***
     "delimiter"
          Any character in VALUE will be used to split STR into words
          (default value = any whitespace).  Note that whitespace is
          implicitly added to the set of delimiter characters unless a
          "%s" format conversion specifier is supplied; see "whitespace"
          parameter below.  The set of delimiter characters cannot be
          empty; if needed Octave substitutes a space as delimiter.
***SNIP***
     "whitespace"
          Any character in VALUE will be interpreted as whitespace and
          trimmed; the string defining whitespace must be enclosed in
          double quotes for proper processing of special characters like
          "\t".  In each data field, multiple consecutive whitespace
          characters are collapsed into one space and leading and
          trailing whitespace is removed.  The default value for
          whitespace is " \b\r\n\t" (note the space).  Whitespace is
          always added to the set of delimiter characters unless at
          least one "%s" format conversion specifier is supplied; in
          that case only whitespace explicitly specified in "delimiter"
          is retained as delimiter and removed from the set of
          whitespace characters.  If whitespace characters are to be
          kept as-is (in e.g., strings), specify an empty value (i.e.,
          "") for "whitespace"; obviously, whitespace cannot be a
          delimiter then.


I think the reason that the blank line is being treated as an item (i.e.,
a[1,1]) is the fact that your example uses at least one %s.  The %s means a
string, and it is conceivable that that string is empty.  Hence any blank
lines are considered empty strings, I guess.

I'm not sure there is a bug here.  What do you think?

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?52892>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]