[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: C++ version of regexprep.cc
From: |
Paul Kienzle |
Subject: |
Re: C++ version of regexprep.cc |
Date: |
Tue, 2 May 2006 23:21:57 -0400 |
On May 2, 2006, at 7:45 PM, David Bateman wrote:
Paul Kienzle wrote:
On May 2, 2006, at 10:57 AM, David Bateman wrote:
I just noted, you didn't state whether this improved the speed of
your xml code sufficiently or not... Or whether there is a another
speed problem elsewhere.
mat2cell is slower than I expect. It needs an OCTAVE_QUIT in its
loop.
The particular file that I'm trying to load is 500 kb, with about
6000 separate values, so 12000 separate open/close tags, and 24000
elements in the partition. I was expecting this to take a couple of
seconds but instead it takes 2 1/2 minutes.
Running some tests, the behaviour of mat2cell is quadratic over
[1000,10000]. Looking at the code I can't tell why.
You can try running 'speed' on it (I'm not tunneling X so I can't
right now):
speed("v=mat2cell(s,1,p);","s=repmat('a',1,n);p=ones(1,n);",10000);
mat2cell is still fast compared to the for loop in xml2mat which
builds the cell structure from the xml.
At this point I'm going to declare defeat and say that xml2mat won't
run on octave with large files in reasonable time.
- Paul
Paul,
I think I've eliminated the quadratic behavior of mat2cell with the
attached version. I now get
octave:1> n = 1000;
octave:2> s = repmat('a',1,n);p=ones(1,n);
octave:3> tic;v=mat2cell(s,1,p);toc Elapsed time is 0.021194
seconds.
octave:4> tic;v=mat2cell(s,1,p);toc Elapsed time is 0.013100
seconds.
octave:5> n = 5000;
octave:6> s = repmat('a',1,n);p=ones(1,n);
octave:7> tic;v=mat2cell(s,1,p);toc Elapsed time is 0.044708
seconds.
octave:8> n = 10000;
octave:9> s = repmat('a',1,n);p=ones(1,n);
octave:10> tic;v=mat2cell(s,1,p);toc Elapsed time is 0.088394
seconds.
which is much more respectable. Does this help your xml2mat code, or
are there other points that are slow?
The splitting step is now a much more reasonable 2.5 seconds.
The remainder of the file is messy for loops which could
probably be made faster if the code was restructured.
The code is full of puzzles such as the following:
iis='i1';
for ii=2:length(w.size);
iis=[iis,',i',num2str(ii)];
end
nn=prod(w.size); %number of elements
eval(['[',iis,']=ind2sub(w.size,[1:nn]);']); % generation of indexes
iis='i1(ind)';
for ii=2:length(w.size);
iis=[iis,',i',num2str(ii),'(ind)'];
end % indexes of indexes
for ind=1:nn
eval(['valor(',iis,')=tag_contents{i,4}(ind);'])
end
if exist('valor')==1;
tag_contents{i,4}=valor;
clear valor;
end
Which as far as I can tell is just:
tag_contents{i,4} = reshape(tag_contents{i,4},w.size);
The latter runs much faster 8-)
With the above transformation, I can now get almost all the way
through in reasonable time.
Now it is hitting a 'max recursion limit exceeded' message in
code even uglier than the above. In addition, I believe this
code relies on the fact that fields in a structure are ordered.
So again I declare defeat and say this code cannot be made to
run in Octave.
Thanks for trying.
- Paul
- Re: C++ version of regexprep.cc, (continued)
- Re: C++ version of regexprep.cc, David Bateman, 2006/05/02
- Re: C++ version of regexprep.cc, Paul Kienzle, 2006/05/02
- Re: C++ version of regexprep.cc, David Bateman, 2006/05/02
- Re: C++ version of regexprep.cc, David Bateman, 2006/05/02
- Re: C++ version of regexprep.cc, John W. Eaton, 2006/05/02
- Re: C++ version of regexprep.cc, Tom Holroyd, 2006/05/02
- Re: C++ version of regexprep.cc, David Bateman, 2006/05/03
- Re: C++ version of regexprep.cc, John W. Eaton, 2006/05/03
- Re: C++ version of regexprep.cc,
Paul Kienzle <=
- Re: C++ version of regexprep.cc, David Bateman, 2006/05/03
- Re: C++ version of regexprep.cc, Paul Kienzle, 2006/05/03
- Re: C++ version of regexprep.cc, David Bateman, 2006/05/04
- Re: C++ version of regexprep.cc, Alois Schloegl, 2006/05/04
- Re: C++ version of regexprep.cc, David Bateman, 2006/05/04
- Re: C++ version of regexprep.cc, Andy Adler, 2006/05/04
- Re: C++ version of regexprep.cc, Bill Denney, 2006/05/04
- Re: C++ version of regexprep.cc, Andy Adler, 2006/05/04
- Re: C++ version of regexprep.cc, Bill Denney, 2006/05/04
- Re: C++ version of regexprep.cc, Andy Adler, 2006/05/04