octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #33680] System stalled by using 'strread'


From: Philip Nienhuis
Subject: [Octave-bug-tracker] [bug #33680] System stalled by using 'strread'
Date: Fri, 01 Jul 2011 18:25:23 +0000
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Follow-up Comment #3, bug #33680 (project octave):

Thank you for quickly reporting back.

regexp is a beast, yes, but I think it was invoked there for a reason (namely,
to be able to split also on %*4s or %*6f -like format specifiers).
With my strsplit() trick, future proper implementation of those "skip forward"
format specifiers becomes more awkward.
But if Octave already chokes on 14 MB files, what will happen if I feed it our
1.2 GB text files?
For now, with strsplit() instead of regexp(), no existing functionality is
lost and textread/textscan/strread work faster while needing less resources.
Should be good enough.


BTW: The problem with newline-as-delimiter is this:

Matlab docs say:
default whitespace for textscan.m: "bt" (no space!)
    "       "          textread.m: " bt" (note space)
    "       "          strread.m:  "brnt" (no space!)
Then, ML docs also say:
default delimiter for textscan.m: whitespace
    "       "         textread.m: none
    "       "         strread.m:  one or more whitespace chars
Combining this, for default delimiters we have (still according to the ML
docs)
textscan.m: "bt" (note: no space! and no newline)
textread.m: no default value (i.e., no newline or space)
strread.m:  "brnt" (no space! but incuding a newline)

For (text) files one can expect records to be separated by newlines, so having
a newline as implied default delimiter is somewhat logical. Indeed, AFAICS
textscan.m and textread.m do seem to always use them as delimiter (Matlab does
this too).
Test case 1 in textscan.m shows that it work this way (and in Matlab this test
case gives the same results).
So, as newline always seems to be implied delimiter when reading from file,
why aren't they listed as default delimiter?

For strings this is less obvious (I can imagine one wants to read substrings
containing newlines, each substring delimited by tabs, from some long text
string).

<UPDATE>
In the mean time, I got an answer:
It turns out that end-of-line (EOL) (which is system-dependent), is treated as
a separate delimiter (indeed!), to be controlled by the "endofline" parameter.

By coincidence I had already implemented parts of this parameter's handling,
but for the wrong reason. Now that I know what it's purpose really is, this
last "missing feature" can finally be fixed as well.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?33680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]