octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new strsplit function


From: Ben Abbott
Subject: Re: new strsplit function
Date: Sun, 31 Mar 2013 20:38:30 -0400

On Mar 31, 2013, at 8:31 PM, Rik wrote:

> On 03/30/2013 08:50 AM, address@hidden wrote:
>> Message: 2
>> Date: Sat, 30 Mar 2013 11:50:36 -0400
>> From: Ben Abbott <address@hidden>
>> To: Jordi Guti?rrez Hermoso <address@hidden>
>> Cc: Octave Maintainers List <address@hidden>
>> Subject: Re: strsplit needs an update
>> Message-ID: <address@hidden>
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> On Mar 27, 2013, at 12:50 PM, Jordi Guti?rrez Hermoso wrote:
>> 
>>>> So this is why it's always an annoyance to try to implement something
>>>> before Matlab... We've had strsplit since 2009 in Octave, and in 2013,
>>>> TMW finally implements it themselves, with a differing calling form
>>>> than ours:
>>>> 
>>>>   http://www.mathworks.com/help/matlab/ref/strsplit.html
>>>> 
>>>> Anyone feel like chasing after this?
>>>> 
>>>> - Jordi G. H.
>> I've prepared a version that is compatible with Matlab, but compatibility 
>> means that the current behavior must be broken.  There are many files which 
>> rely upon strsplit().
>> 
>> scripts/deprecated/javafields.m
>> scripts/deprecated/javamethods.m
>> scripts/general/fieldnames.m
>> scripts/general/int2str.m
>> scripts/general/methods.m
>> scripts/general/num2str.m
>> scripts/help/gen_doc_cache.m
>> scripts/help/help.m
>> scripts/help/lookfor.m
>> scripts/io/strread.m
>> scripts/java/javaclasspath.m
>> scripts/miscellaneous/compare_versions.m
>> scripts/miscellaneous/computer.m
>> scripts/miscellaneous/fact.m
>> scripts/miscellaneous/tar.m
>> scripts/miscellaneous/unpack.m
>> scripts/miscellaneous/what.m
>> scripts/miscellaneous/zip.m
>> scripts/pkg/private/configure_make.m
>> scripts/pkg/private/fix_depends.m
>> scripts/pkg/private/generate_lookfor_cache.m
>> scripts/pkg/private/list_forge_packages.m
>> scripts/pkg/private/unload_packages.m
>> scripts/pkg/private/write_index.m
>> scripts/plot/private/__file_filter__.m
>> scripts/plot/private/__fltk_file_filter__.m
>> scripts/plot/private/__go_draw_axes__.m
>> scripts/plot/private/__next_line_style__.m
>> scripts/strings/strsplit.m
>> scripts/strings/untabify.m
>> scripts/testfun/rundemos.m
>> scripts/testfun/runtests.m
>> 
>> I've modified the files to use the new version, and produced a changeset.  I 
>> don't see any regressions, but this may break some code on Octave-Forge.
>> 
>> The threee major impacts are;
>> 
>> (1) Delimiters are now string vectors, not scalars.
>> 
>> Old behavior
>> 
>> strsplit ("1 2, 3", ", ")
>> ans = 
>> {
>>  [1,1] = 1
>>  [1,2] = 2
>>  [1,3] = 
>>  [1,4] = 3
>> }
>> 
>> New (compatible) behavior
>> 
>> strsplit ("1 2, 3", ", ")
>> ans = 
>> {
>>  [1,1] = 1 2
>>  [1,2] = 3
>> }
>> 
>> (2) The Matlab default is to treat consecutive delimiters are treated as a 
>> single delimiter.  The old Octave default was the opposite.
>> 
>> (3) The old behavior supporting 2D character arrays is gone (only vectors 
>> now).
>> 
>> Comments?
> 3/31/13
> 
> Ben,
> 
> Maybe you've already done it, but add an item to the NEWS file describing
> the big changes in calling convention and how to regain exact compatibility
> if necessary.  We will be breaking scripts here--in Octave core, Octave
> Forge, and people's own libraries.  Usually when we do something this
> drastic we deprecate the function for 2 release cycles.  Should we consider
> doing that here?  Maybe moving strsplit.m to deprecated/ostrsplit.m ('o'
> for Octave) would be good.  In that way old scripts could be updated with a
> simple search and replace of 'strsplit' for 'ostrsplit' until there was
> time to fix things more properly.
> 
> Do you have benchmarks on the performance of the old and new functions?  I
> took a brief look and most of the usages in core seem to be on short
> strings for which it won't matter.  However, I'm concerned about the usages in
> 
> scripts/miscellaneous/zip.m
> scripts/miscellaneous/unpack.m
> scripts/miscellaneous/tar.m
> scripts/io/strread.m
> scripts/general/num2str.m
> scripts/general/int2str.m
> 
> strread, which is also used by textscan and textread, could actually see
> large input strings where performance would be an issue.
> 
> --Rik

Rik,

I've pushed a changeset that includes a note in the NEWS file.

        http://hg.savannah.gnu.org/hgweb/octave/rev/1de4ec2a856d

I have not run any benchmarks.  If it is any help, the new version is based on 
regexp().

If we are to add another script, perhaps cstrsplit() is a good names (we did 
that for strcat some time ago).  Where "c" is for (c)onventional.

Ben





reply via email to

[Prev in Thread] Current Thread [Next in Thread]