octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: newline in strread format


From: Philip Nienhuis
Subject: Re: newline in strread format
Date: Tue, 24 Jun 2014 20:49:03 +0200
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:29.0) Gecko/20100101 Firefox/29.0 SeaMonkey/2.26.1

John W. Eaton wrote:
I think the following should work, but it's throwing an error:

   octave:1> [a, b, c] = strread ("1 2 3\n4 5 6\n7 8 9\n", "%f %f %f\n")
   strread: FORMAT does not match data
   error: called from 'strread' in file
/home/jwe/src/octave-stable/scripts/io/strread.m near line 745, column 13
   error: called from:
   error:   /home/jwe/src/octave-stable/scripts/io/strread.m at line
755, column 9

I took a look at strread.m but I'm not sure what the right approach is
for a fix.  It appears to me that newlines are removed from the string
and that it is split on the field delimiter.  But the newline
character remains in the list of format specifiers.

I noticed the problem in the stable sources, but it seems to also be
present in the current sources on the default branch.

Any clues would be much appreciated.

As I wrote most of the strread.m code processing this stuff I suppose I'm the one to have a look at it.

But just to be sure before I give it a go: is this undocumented, or at least obscure, ML behavior we have at hand here?

AFAIU the Matlab docs, any non-[format conversion specifier] in the format string is to be treated as a literal. That is the reason strread.m retains it in the format specifier list.
Experimenting a bit with ML r2014a shows that Matlab does this too.


Now, strread.m removes the "regular delimiters" (the ones specified by the user or the default ones) long before it processes literals; even though literals can be interpreted as just another delimiter (but then again, only in positions/columns in the data specified in the format string).


There are a few possible strategies (assuming this issue you brought up only pertains to valid delimiters specified as a literal):

1. strread.m could scan the format string and remove any delimiters it finds there from the list of delimiters.

2. Or it could just remove it from the list of literals and add it to the delimiters list (probably the easiest).

3. strread.m could first strrep literals in the data string into a valid delimiter - a costly operation on big data strings (/-files). And this wouldn't honor that literals should only be processed as such in data "columns" specified in the format string (in other positions they should be processed as a string value).

1. and 2. are easy mods; bu maybe too quick & dirty.
However due care is required for e.g., overlapping sets of whitespace and delimiters. I think experimenting with ML to find out the proper behavior is required plus a set of tests for Octave. I do not have time for that currently.

Either way, I see quite a few pitfalls and corner cases looming.
Given the fact that strread.m is already a bit of a dinosaur and we're actually waiting for a binary textscan, I wonder whether it is worthwile spending much time fixing this issue - any opinion on that?

Philip




reply via email to

[Prev in Thread] Current Thread [Next in Thread]