octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #33876] textscan: resuming reading does not wo


From: Philip Nienhuis
Subject: [Octave-bug-tracker] [bug #33876] textscan: resuming reading does not work
Date: Sun, 31 Jul 2011 21:04:32 +0000
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Follow-up Comment #6, bug #33876 (project octave):

Attached is a tiny patch for strread.m needed in some corner cases, I hit one
exactly when trying resuming reading in the course of other issues. Perhaps it
helps...
(This patch forms part of a bigger patch set that I sent privately to Rik a
few days ago and that covered CollectOutput, additional fixes for
MultiDelimsAsOne, int32 casts, + revised tests because of the int32 output for
%d %u arguments.)


Oh, in your first script there's a syntax error in the one but last isequal
(missing "}")

Without going further into the details of your patches: as regards your ML
findings, I'd like to mention that I find Matlab's behaviour often
inconsistent and unpredictable. That applies especially for textread and
strread (which they apparently are going to drop in the future), but textscan
is not very clear either. Or at least the ML docs aren't.
Sometimes there are two or more seemingly equivalent but different format
strings of which just one works (in "difficult" files, admittedly), sometimes
dependent on extra parameter values.

As to your patch: those refer to parts of textscan.m that I didn't touch - Rik
made those changes. From the header in the message you sent through the bug
tracker I see he's not in the mail notification list - I think he should be so
I added him.

Apart from the multi-line format string issue I mentioned earlier it occurred
to me there's another gotcha in the way format repeat count currently works in
textscan/textread:
The potential existence of comment lines in the data is completely ignored.
So you specify using a format string N times expecting to read a specific
amount of data, and in extreme cases you get back an empty output because the
first N lines were actually comment lines.

Therefore I'm not sure if much more effort should be invested in the current
situation. Maybe only as a temporary fix.

My ideas on this issue:

(A) The overall superior way would be to have a compiled strread. That could
plough linearly through the file, rather than the forced column-wise way it
works now. But yes this is one of those nasty "if only...."s.

(B) Starting from the current state of things, I think a better way would be
to:
(1) Re-implement format-repeat count processing in strread;
(2) In case of format repeat count arguments, have textread and textscan just
read a liberally big chunk of the file (or string);
(3) Let textread and textscan communicate to strread that a format repeat
count was requested (by some non-documented parameter);
(4) Let strread sort out how much of the chunk was actually needed;
(5) Let strread communicate back to textscan or textread how much was read of
the data text string it was passed (a non-documented output arg);
(6) Have textscan or textread do an fseek using the info they got back from
strread.

- This would be a more stable fix that covers all known cases;
- Format repeat count would work again for strread (ML compatibility);
- Textscan could resume from either files or text strings (now it only "works"
for files; => better ML compatibility as well).

But especially step (4) needs good analysis - the input string is completely
mangled, stripped, split up, recombined and stripped and split again before
strread knows how many "format string occurences" were actually present in the
data. Keeping track of the exact limit through all these steps can be
complicated ("can" - I haven't looked at it in detail).

To do this, textscan and textread may need to know a bit more exactly how much
to read.
To do this cleanly, I think some parts of strread have to be separated into
separate utility functions in a ./private directory. These can be called by
textscan & textread, so some identical code appearing in all three could be
dropped.
This refers to the format string parsing section and the part where the format
stuff is matched to the first "line" of the file (or text string). 
Perhaps comment line processing may also come in the picture.

I don't doing this, but I have no idea when I will have time for this. Now
that textscan/textread/strread work acceptably well, I'd rather help building
the 3.4.x MinGW binary.


As far as John's plans go, thank you for pointing me onto that.
I'll post in the maintainers list about it.


(file #23719)
    _______________________________________________________

Additional Item Attachment:

File name: strread.m.diff_30July2011      Size:0 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?33876>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]