octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: C++ version of regexprep.cc


From: David Bateman
Subject: Re: C++ version of regexprep.cc
Date: Tue, 02 May 2006 21:39:50 +0200
User-agent: Thunderbird 1.5 (Windows/20051201)

Paul Kienzle wrote:

On May 2, 2006, at 10:57 AM, David Bateman wrote:

I just noted, you didn't state whether this improved the speed of your xml code sufficiently or not... Or whether there is a another speed problem elsewhere.

mat2cell is slower than I expect.  It needs an OCTAVE_QUIT in its loop.

The particular file that I'm trying to load is 500 kb, with about 6000 separate values, so 12000 separate open/close tags, and 24000 elements in the partition. I was expecting this to take a couple of seconds but instead it takes 2 1/2 minutes.

Running some tests, the behaviour of mat2cell is quadratic over [1000,10000]. Looking at the code I can't tell why.

You can try running 'speed' on it (I'm not tunneling X so I can't right now):

    speed("v=mat2cell(s,1,p);","s=repmat('a',1,n);p=ones(1,n);",10000);


mat2cell is still fast compared to the for loop in xml2mat which builds the cell structure from the xml.

At this point I'm going to declare defeat and say that xml2mat won't run on octave with large files in reasonable time.
I think we can therefore do better. In matlab I get

>> n = 1000;
>> s = repmat('a',1,n);p=ones(1,n);
>> tic;v=mat2cell(s,1,p);toc
Elapsed time is 0.030060 seconds.
>> n = 5000; >> s = repmat('a',1,n);p=ones(1,n); >> tic;v=mat2cell(s,1,p);toc Elapsed time is 0.148529 seconds. >> n = 10000; >> s = repmat('a',1,n);p=ones(1,n); >> tic;v=mat2cell(s,1,p);toc Elapsed time is 0.295874 seconds.

So pretty clearly linear time. In octave (though not run on the same machine as I'm using the vpn to get onto a machine to run matlab) I get

octave:1> n = 1000;
octave:2> s = repmat('a',1,n);p=ones(1,n);
octave:3> tic;v=mat2cell(s,1,p);toc
Elapsed time is 0.096954 seconds.
octave:4> n = 5000; octave:5> s = repmat('a',1,n);p=ones(1,n); octave:6> tic;v=mat2cell(s,1,p);toc Elapsed time is 0.715096 seconds. octave:7> n = 10000; octave:8> s = repmat('a',1,n);p=ones(1,n); octave:9> tic;v=mat2cell(s,1,p);toc Elapsed time is 2.910286 seconds.

so, as you say, pretty clearly quadratic time, and not that competitive (though I am comparing a bi-proc 2.4GHz Xeon P4 running matlab against a 1.6GHz P4M running octave). This is all the more depressing as the matlab code is an m-file...

I think I see what the issue though. We are reallocating a ColumnVector in the interior of the loop from the input arguments. I don't really see a good way around this, but it can be confirmed that this is a issue, as if we change the call to mat2cell a bit, then the allocation of the ColumnVector is done once. That is

octave:23> n = 1000; octave:24> s = repmat('a',1,n);p=ones(1,n); octave:25> tic;v=mat2cell(s.',p);toc Elapsed time is 0.011963 seconds. octave:26> n = 5000; octave:27> s = repmat('a',1,n);p=ones(1,n); octave:28> tic;v=mat2cell(s.',p);toc Elapsed time is 0.035385 seconds. octave:29> n = 10000; octave:30> s = repmat('a',1,n);p=ones(1,n); octave:31> tic;v=mat2cell(s.',p);toc Elapsed time is 0.064461 seconds.

Which is much nicer. Can you use this to speed-up your code? I'll see what I can do to get the speed up in the other case..

D.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]