octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: C++ version of regexprep.cc


From: Paul Kienzle
Subject: Re: C++ version of regexprep.cc
Date: Tue, 2 May 2006 10:18:38 -0400


On May 2, 2006, at 7:25 AM, David Bateman wrote:

Paul Kienzle wrote:
David,

octave now goes quickly through the regular expression portion of the code.

I haven't yet confirmed that the results are consistent with matlab.

The next portion involves for loops such as the following:

  tag = cell(number_of_tags,4);
  for i=1:number_of_tags
   tag{i,1} = xml(tag_start(i):tag_end(i))
  end

which for 10000 tags is slow.

Are there octave routines for splitting/joining strings into cells
which are fast?

- Paul

Paul,

Hey, I'm on holidays at the moment, and so have a little time. What about the attached implementation of mat2cell? With this you should be able to repalce the above code with

tag = cell(number_of_tags,4);
tag{:,1} = mat2cell (xml, 1, tag_end - tag_start);

mat2cell partitions the matrix into cells. The xml2cell code extracts substrings.

The following does what I expect:

    xml='<eh><bee>   <see> deed </see>  </bee></eh>';
        tag_start = find(xml=='<');
    tag_end = find(xml=='>');
        pieces = [ tag_start; tag_end+1 ];
    partition = diff([1;pieces(:);length(xml)+1]);
    tag_name = mat2cell (xml, 1, partition) (2:2:end);

        tags = cell(length(tag_start),4);
    tags(:,1) = tag_name';

John, do you want mat2cell committed as well?

mat2cell is a core function and it works like expected so the answer should be yes.


Here are a couple of test cases

/*

%!test
%! x = reshape(1:20,5,4);
%! c = mat2cell(x,[3,2],[3,1]);
%! assert(c,{[1,6,11;2,7,12;3,8,13],[16;17;18];[4,9,14;5,10,15],[19;20]})

%!test
%! x = 'abcdefghij';
%! c = mat2cell(x,1,[0,4,2,0,4,0]);
%! assert(c,{'','abcd','ef','','ghij',''})

*/

The second test is failing because size(c{1}) is [1,0] but size('') is [0,0].

- Paul



reply via email to

[Prev in Thread] Current Thread [Next in Thread]