octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New strsplit function


From: Ben Abbott
Subject: Re: New strsplit function
Date: Thu, 16 May 2013 14:19:38 +0800

On May 16, 2013, at 1:39 PM, John W. Eaton wrote:

> I received a report that the new strsplit function doesn't match
> Matlab behavior for the following input.  I looked at fixing it, but
> I'm afraid I'll screw something else up because of the fairly complex
> interactions among all the different options (legacy,
> collapsedelimiters, etc.).  Here's the simple test case:
> 
>  With Matlab 2013a:
> 
>  matlab> sgeQueryStr = '::'
> 
>  sgeQueryStr =
> 
>  ::
> 
>  matlab> splitStr = strsplit(deblank(sgeQueryStr), ':')
> 
>  splitStr =
> 
>      ''    ''
> 
>  matlab> length(splitStr)
> 
>  ans =
> 
>       2
> 
> 
> So, what's the proper fix?
> 
> Also, I think that Matlab is saying that a delimiter at the beginning
> of a string generates an empty result, but one at the end does not.
> Before the recent changes to strsplit, Octave would return three empty
> strings for this case.  So should we consider that a bug in Octave?
> If not, how do we preserve old behavior and also get Matlab
> compatibility right in this case?
> 
> If we can't do both, maybe we should just abandon the "legacy"
> behavior in our current strsplit function?  If we do that, I suppose
> we could distribute the old version as ostrsplit for a release or two.
> 
> jwe

hmmm ... I took a look at Matlab 2013a.  It's not clear to me that we'd want to 
copy this.

matlab> strsplit('', 'a')

ans = 

    {''}

matlab> strsplit('a', 'a')

ans = 

    ''    ''

matlab> strsplit('aa', 'a')

ans = 

    ''    ''

matlab> strsplit('aaa', 'a')

ans = 

    ''    ''

matlab> strsplit('aaaa', 'a')

ans = 

    ''    ''
matlab> strsplit ('abc', {'a','b','c'})

ans = 

    ''    ''
In case it isn't clear, the output is a cellstring containing two empty strings.

The Matlab docs (http://www.mathworks.com/help/matlab/ref/strsplit.html) says 
that consecutive delimiters are collapsed by default.  Which means the 
documented behavior is to return {''} in each case above.  If I had to guess, 
I'd say Matlab's first attempt at strsplit () has a bug?  Either that, or the 
documentation is wrong.

In either event, I'm ok with preserving the original strsplit as a separate 
file.  Do you prefer ostrsplit.m or (for consistency with cstrcat.m) should be 
go with cstrsplit.m?

What is the best way to re-introduce & rename the original version?  is there a 
mercurial trick that will do that?

Ben




reply via email to

[Prev in Thread] Current Thread [Next in Thread]