octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new strsplit function


From: Ben Abbott
Subject: Re: new strsplit function
Date: Tue, 23 Apr 2013 08:18:10 -0400

On Apr 20, 2013, at 6:49 PM, Ben Abbott wrote:

> On Apr 20, 2013, at 5:41 PM, Philip Nienhuis wrote:
> 
>> Ben Abbott wrote:
>>> On Apr 20, 2013, at 3:51 PM, Philip Nienhuis wrote:
>>> 
>>>> Ben Abbott wrote:
>>>>>> On Apr 2, 2013, at 11:42 PM, John W. Eaton wrote:
>> :
>> <snip>
>> :
>>>>>>> Also, the new strsplit does not seem to be working correctly for 
>>>>>>> single-quoted strings that contain escape sequences.  I think that
>>>>>>> 
>>>>>>> strsplit ("foo\tbar", '\t')
>>>>>>> 
>>>>>>> should split on the TAB, but it is currently returning the original 
>>>>>>> string for me.
>>>>>> 
>>>>>> Opps.  I hadn't considered single quoted strings.  Your example does 
>>>>>> work in Matlab.  How is this sort of thing handled in other places?  It 
>>>>>> is sufficient to just ...
>> 
>> <snip>
>> 
>>>> And then there is the bug John mentioned with escape characters.
>>>> On a stable MinGW 3.6.4 build it doesn't work either:
>>>> 
>>>> octave:1>  strsplit ("foo\tbar", '\t')
>>>> ans =
>>>> {
>>>> [1,1] = foo   bar
>>>> }
>>> 
>>> Is this with the changeset I attached applied?
>> 
>> Sorry I got a but confused, didn't pick up that your "sq" cs was for that 
>> very issue.
>> 
>> I applied it manually to yesterday's MXE-build (from yesterday's updated 
>> source tree) and now it works correctly:
>> 
>> octave-cli:1> test strsplit
>> PASSES 38 out of 38 tests
>> octave-cli:2> strsplit ("foo\tbar", '\t')
>> ans =
>> {
>> [1,1] = foo
>> [1,2] = bar
>> }
>> octave-cli:3>
>> 
>> 
>> Anyway, in view of JWE's comments, what is going to happen now with 
>> strsplit.m?
>> Do we keep the changes and the "legacy" option, or do we simply follow ML as 
>> John suggested?
>> 
>> As long as speed isn't affected (-significantly) for simple cases I don't 
>> care very much.
> 
> Rik indicated the slow down was about 30x.  If we keep the "legacy" code, but 
> don't want to commit to supporting the "legacy" option, then I'd just 
> eliminate "legacy" from the doc-string and throw an error if the user 
> requests it.
> 
>> You have already patched the affected core functions.
>> In the io package (spreadsheet functions) there are only "simple" calls to 
>> strsplit (i.e. with just a one-char delimiter) so am I right to assume no 
>> changes are required there?
> 
> If we get rid of the "legacy", then only list_forge_packages.m and strread.m 
> need to be patched.
> 
>> BTW in the texinfo help of strsplit there's neither an (explicit) 
>> explanation of what the 'simple' DELIMITERTYPE does, nor an example.
> 
> I can fix that.  The "simple" version doesn't support regexp expressions.   
> Although the current implementation of "simple" does rely upon regexp.  There 
> may be some some cases (bugs) where regexp stuff sneaks past by my clumsy 
> attempt to escape special characters.  Perhaps it is a good idea to rewrite 
> that part so as not to rely on regexp?
> 
> Ben

I got side tracked looking at how to correctly collect the collapsed delimiters 
for the 2nd output.  

        http://hg.savannah.gnu.org/hgweb/octave/rev/03a28487fa9d

I'll prepare the changeset for sq-strings next.

Ben




reply via email to

[Prev in Thread] Current Thread [Next in Thread]