Re: Improving strread / textread / textscan

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Improving strread / textread / textscan

From:	Ben Abbott
Subject:	Re: Improving strread / textread / textscan
Date:	Sun, 23 Oct 2011 18:42:21 -0400

On Oct 23, 2011, at 5:20 PM, Philip Nienhuis wrote:

> Ben Abbott wrote:
>> 
>> On Oct 23, 2011, at 3:59 PM, PhilipNienhuis wrote:
>> 
>>> Motivated by this thread
>>> https://mailman.cae.wisc.edu/pipermail/help-octave/2011-October/048038.html
>>> I had another look at strread.m. I have 2 questions about it:
>>> 
>>> Q1:
>>> After searching around on the web, I now have inferred the following
>>> behavior of strread (&  textscan) in ML:
>>> a. "Words" or fields (to be interpreted later) are separated by white-space.
>>> b. The white-space char set can be adapted by the user with the "whitespace"
>>> keyword. It can even be set to empty.
>>> c. White-space is understood to possibly be a vector of white-space chars
>>> that during reading is folded into one char that separates two fields.
>>> d. Delimiters are characters that are augmented to white-space (they don't
>>> replace the white-space char set), but other than white-space, vectors of
>>> delimiters, or of several delimiters and white-space, are not folded into
>>> one char that separates fields.
>>> e. Yet, vectors of white-space and one delimiter are folded into one
>>> white-space that separates fields.
>>> f. However, if so desired, multiple consecutive delimiters can be folded
>>> into one delimiter if "MultipleDelimsAsOne" parameter is set to 1.
>>> g. EOL char sequences (\n, \r\n, or \r) are also delimiters, but are not
>>> affected by the MultipleDelimsAsOne parameter.
>>> (...what a mess...)
>>> 
>>> Is there agreement with my interpretation of ML's behaviour?
>>> 
>>> Q2:
>>> There's ample room for improvement in various parts I wrote. But I need to
>>> know:
>>> which one is faster,  strrep  or  regexprep ?
>>> Both of these are needed in several places, but AFAICS regexprep is more
>>> versatile.
>>> Roughly speaking, as strread.m stands now, for each of the points above a
>>> separate regexprep or strrep run (or series of runs) is needed on the entire
>>> "file". So it is important to know what functions are the fastest.
>>> 
>>> Thanks,
>>> 
>>> Philip
>> 
>> We had discussed making some significant changes to these back in 2010.
>> 
>>      
>> http://octave.1599824.n4.nabble.com/advice-help-needed-for-reading-formatted-text-textscan-strread-amp-textread-tt3009750.html#none
>> 
>> There was another discussion earlier this year.
>> 
>>      
>> http://octave.1599824.n4.nabble.com/Release-goals-for-3-6-tt3711420.html#none
> 
> I know both of these threads, and I participated quite a bit into the second 
> one.
> 
>> I'm not sure how much has been done at this point, but reviewing the 
>> threads, I see John had asked some tests be written. Some of that has been 
>> done, but my impression is that there are a lot of remaining features of the 
>> ML version that remain untested.
> 
> Well, I already more than doubled the number of tests for strread, textread 
> and textscan inj the course of fixing them.
> Of course, given ML's undocumented behavior, the number of test might really 
> need to be quadrupled ... :-)
> 
> But serious, I think currently there are adequate tests for most if not all 
> functionality currently built into Octave's text reading functions.
> It is the odd corner cases that lack tests, but these usually only come up in 
> the help-octave list & bug tracker.
> 
> There is some ML functionality not yet ported to Octave (double quotes, %f32 
> %i64, etc.) but that will probably only come if jwe ever finishes his 
> textscan.oct (he started earlier this year with that).
> Tests for those are not too urgent right now.
> 
>> Would you be interested in cooperating on writing more tests that cover the 
>> questions you ask above (as well as others)?
> 
> What really needs to be done now is writing tests for ML, to pinpoint its 
> behavior, rather than adding tests to Octave.
> 
> Once again: do you think my assessment of ML's strread/textscan behavior in 
> my original posting would be acceptable?
> 
> Philip

Ok. Lets start with writing tests for ML. I'll start by extracting Octave's 
tests and confirm they work on ML.

Ben

[Prev in Thread]

Current Thread

[Next in Thread]

Improving strread / textread / textscan, PhilipNienhuis, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
  - Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/23
    - Re: Improving strread / textread / textscan, Ben Abbott <=
    - Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
    - Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
    - Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24
    - Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
    - Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24
    - Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
    - Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
    - Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24
    - Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
    - Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24

Prev by Date: Re: Improving strread / textread / textscan
Next by Date: Re: Improving strread / textread / textscan
Previous by thread: Re: Improving strread / textread / textscan
Next by thread: Re: Improving strread / textread / textscan
Index(es):
- Date
- Thread