octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new strsplit function


From: Ben Abbott
Subject: Re: new strsplit function
Date: Sat, 20 Apr 2013 18:49:42 -0400

On Apr 20, 2013, at 5:41 PM, Philip Nienhuis wrote:

> Ben Abbott wrote:
>> On Apr 20, 2013, at 3:51 PM, Philip Nienhuis wrote:
>> 
>>> Ben Abbott wrote:
>>>>> On Apr 2, 2013, at 11:42 PM, John W. Eaton wrote:
> :
> <snip>
> :
>>>>>> Also, the new strsplit does not seem to be working correctly for 
>>>>>> single-quoted strings that contain escape sequences.  I think that
>>>>>> 
>>>>>> strsplit ("foo\tbar", '\t')
>>>>>> 
>>>>>> should split on the TAB, but it is currently returning the original 
>>>>>> string for me.
>>>>> 
>>>>> Opps.  I hadn't considered single quoted strings.  Your example does work 
>>>>> in Matlab.  How is this sort of thing handled in other places?  It is 
>>>>> sufficient to just ...
> 
> <snip>
> 
>>> And then there is the bug John mentioned with escape characters.
>>> On a stable MinGW 3.6.4 build it doesn't work either:
>>> 
>>> octave:1>  strsplit ("foo\tbar", '\t')
>>> ans =
>>> {
>>>  [1,1] = foo   bar
>>> }
>> 
>> Is this with the changeset I attached applied?
> 
> Sorry I got a but confused, didn't pick up that your "sq" cs was for that 
> very issue.
> 
> I applied it manually to yesterday's MXE-build (from yesterday's updated 
> source tree) and now it works correctly:
> 
> octave-cli:1> test strsplit
> PASSES 38 out of 38 tests
> octave-cli:2> strsplit ("foo\tbar", '\t')
> ans =
> {
>  [1,1] = foo
>  [1,2] = bar
> }
> octave-cli:3>
> 
> 
> Anyway, in view of JWE's comments, what is going to happen now with 
> strsplit.m?
> Do we keep the changes and the "legacy" option, or do we simply follow ML as 
> John suggested?
> 
> As long as speed isn't affected (-significantly) for simple cases I don't 
> care very much.

Rik indicated the slow down was about 30x.  If we keep the "legacy" code, but 
don't want to commit to supporting the "legacy" option, then I'd just eliminate 
"legacy" from the doc-string and throw an error if the user requests it.

> You have already patched the affected core functions.
> In the io package (spreadsheet functions) there are only "simple" calls to 
> strsplit (i.e. with just a one-char delimiter) so am I right to assume no 
> changes are required there?

If we get rid of the "legacy", then only list_forge_packages.m and strread.m 
need to be patched.

> BTW in the texinfo help of strsplit there's neither an (explicit) explanation 
> of what the 'simple' DELIMITERTYPE does, nor an example.

I can fix that.  The "simple" version doesn't support regexp expressions.   
Although the current implementation of "simple" does rely upon regexp.  There 
may be some some cases (bugs) where regexp stuff sneaks past by my clumsy 
attempt to escape special characters.  Perhaps it is a good idea to rewrite 
that part so as not to rely on regexp?

Ben




reply via email to

[Prev in Thread] Current Thread [Next in Thread]