[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Improving strread / textread / textscan
From: |
Ben Abbott |
Subject: |
Re: Improving strread / textread / textscan |
Date: |
Sun, 23 Oct 2011 16:53:55 -0400 |
On Oct 23, 2011, at 3:59 PM, PhilipNienhuis wrote:
> Motivated by this thread
> https://mailman.cae.wisc.edu/pipermail/help-octave/2011-October/048038.html
> I had another look at strread.m. I have 2 questions about it:
>
> Q1:
> After searching around on the web, I now have inferred the following
> behavior of strread (& textscan) in ML:
> a. "Words" or fields (to be interpreted later) are separated by white-space.
> b. The white-space char set can be adapted by the user with the "whitespace"
> keyword. It can even be set to empty.
> c. White-space is understood to possibly be a vector of white-space chars
> that during reading is folded into one char that separates two fields.
> d. Delimiters are characters that are augmented to white-space (they don't
> replace the white-space char set), but other than white-space, vectors of
> delimiters, or of several delimiters and white-space, are not folded into
> one char that separates fields.
> e. Yet, vectors of white-space and one delimiter are folded into one
> white-space that separates fields.
> f. However, if so desired, multiple consecutive delimiters can be folded
> into one delimiter if "MultipleDelimsAsOne" parameter is set to 1.
> g. EOL char sequences (\n, \r\n, or \r) are also delimiters, but are not
> affected by the MultipleDelimsAsOne parameter.
> (...what a mess...)
>
> Is there agreement with my interpretation of ML's behaviour?
>
> Q2:
> There's ample room for improvement in various parts I wrote. But I need to
> know:
> which one is faster, strrep or regexprep ?
> Both of these are needed in several places, but AFAICS regexprep is more
> versatile.
> Roughly speaking, as strread.m stands now, for each of the points above a
> separate regexprep or strrep run (or series of runs) is needed on the entire
> "file". So it is important to know what functions are the fastest.
>
> Thanks,
>
> Philip
We had discussed making some significant changes to these back in 2010.
http://octave.1599824.n4.nabble.com/advice-help-needed-for-reading-formatted-text-textscan-strread-amp-textread-tt3009750.html#none
There was another discussion earlier this year.
http://octave.1599824.n4.nabble.com/Release-goals-for-3-6-tt3711420.html#none
I'm not sure how much has been done at this point, but reviewing the threads, I
see John had asked some tests be written. Some of that has been done, but my
impression is that there are a lot of remaining features of the ML version that
remain untested.
Would you be interested in cooperating on writing more tests that cover the
questions you ask above (as well as others)?
Ben
- Improving strread / textread / textscan, PhilipNienhuis, 2011/10/23
- Re: Improving strread / textread / textscan,
Ben Abbott <=
- Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24
- Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24
- Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
- Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24