octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new strsplit function


From: Rik
Subject: Re: new strsplit function
Date: Sun, 31 Mar 2013 17:31:13 -0700

On 03/30/2013 08:50 AM, address@hidden wrote:
> Message: 2
> Date: Sat, 30 Mar 2013 11:50:36 -0400
> From: Ben Abbott <address@hidden>
> To: Jordi Guti?rrez Hermoso <address@hidden>
> Cc: Octave Maintainers List <address@hidden>
> Subject: Re: strsplit needs an update
> Message-ID: <address@hidden>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Mar 27, 2013, at 12:50 PM, Jordi Guti?rrez Hermoso wrote:
>
>> > So this is why it's always an annoyance to try to implement something
>> > before Matlab... We've had strsplit since 2009 in Octave, and in 2013,
>> > TMW finally implements it themselves, with a differing calling form
>> > than ours:
>> > 
>> >    http://www.mathworks.com/help/matlab/ref/strsplit.html
>> > 
>> > Anyone feel like chasing after this?
>> > 
>> > - Jordi G. H.
> I've prepared a version that is compatible with Matlab, but compatibility 
> means that the current behavior must be broken.  There are many files which 
> rely upon strsplit().
>
> scripts/deprecated/javafields.m
> scripts/deprecated/javamethods.m
> scripts/general/fieldnames.m
> scripts/general/int2str.m
> scripts/general/methods.m
> scripts/general/num2str.m
> scripts/help/gen_doc_cache.m
> scripts/help/help.m
> scripts/help/lookfor.m
> scripts/io/strread.m
> scripts/java/javaclasspath.m
> scripts/miscellaneous/compare_versions.m
> scripts/miscellaneous/computer.m
> scripts/miscellaneous/fact.m
> scripts/miscellaneous/tar.m
> scripts/miscellaneous/unpack.m
> scripts/miscellaneous/what.m
> scripts/miscellaneous/zip.m
> scripts/pkg/private/configure_make.m
> scripts/pkg/private/fix_depends.m
> scripts/pkg/private/generate_lookfor_cache.m
> scripts/pkg/private/list_forge_packages.m
> scripts/pkg/private/unload_packages.m
> scripts/pkg/private/write_index.m
> scripts/plot/private/__file_filter__.m
> scripts/plot/private/__fltk_file_filter__.m
> scripts/plot/private/__go_draw_axes__.m
> scripts/plot/private/__next_line_style__.m
> scripts/strings/strsplit.m
> scripts/strings/untabify.m
> scripts/testfun/rundemos.m
> scripts/testfun/runtests.m
>
> I've modified the files to use the new version, and produced a changeset.  I 
> don't see any regressions, but this may break some code on Octave-Forge.
>
> The threee major impacts are;
>
> (1) Delimiters are now string vectors, not scalars.
>
> Old behavior
>
> strsplit ("1 2, 3", ", ")
> ans = 
> {
>   [1,1] = 1
>   [1,2] = 2
>   [1,3] = 
>   [1,4] = 3
> }
>
> New (compatible) behavior
>
> strsplit ("1 2, 3", ", ")
> ans = 
> {
>   [1,1] = 1 2
>   [1,2] = 3
> }
>
> (2) The Matlab default is to treat consecutive delimiters are treated as a 
> single delimiter.  The old Octave default was the opposite.
>
> (3) The old behavior supporting 2D character arrays is gone (only vectors 
> now).
>
> Comments?
3/31/13

Ben,

Maybe you've already done it, but add an item to the NEWS file describing
the big changes in calling convention and how to regain exact compatibility
if necessary.  We will be breaking scripts here--in Octave core, Octave
Forge, and people's own libraries.  Usually when we do something this
drastic we deprecate the function for 2 release cycles.  Should we consider
doing that here?  Maybe moving strsplit.m to deprecated/ostrsplit.m ('o'
for Octave) would be good.  In that way old scripts could be updated with a
simple search and replace of 'strsplit' for 'ostrsplit' until there was
time to fix things more properly.

Do you have benchmarks on the performance of the old and new functions?  I
took a brief look and most of the usages in core seem to be on short
strings for which it won't matter.  However, I'm concerned about the usages in

scripts/miscellaneous/zip.m
scripts/miscellaneous/unpack.m
scripts/miscellaneous/tar.m
scripts/io/strread.m
scripts/general/num2str.m
scripts/general/int2str.m

strread, which is also used by textscan and textread, could actually see
large input strings where performance would be an issue.

--Rik


reply via email to

[Prev in Thread] Current Thread [Next in Thread]