[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regexp question
From: |
Philip Nienhuis |
Subject: |
Re: regexp question |
Date: |
Tue, 06 Dec 2011 18:52:31 +0100 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6 |
Sergei, Wiliam,
2 answers in one post:
Sergei Steshenko wrote:
I guess you need 'aa' surrounded by not 'a'. Octave uses PCRE; I am not
familiar with nuances of Octave PCRE usage; in Perl I would write the regular
expression this way:
[^a]aa[^a]
and if/when it matches, it returns pointer to the character preceding the 'aa'
substring, i.e. in case of 'baab' it should return pointer to the first 'b'.
Thanks, Sergei. I already tried this and found it'll work, but
unfortunately not in a more complicated situation:
octave:35> tststr3 = 'aa aaaaa baa' ## Patterns at start & end
tststr3 = aa aaaaa baa
octave:36> regexp (tststr3, "[^a]aa[^a]")
ans = [](1x0) ## Hey......
but
octave:41> tststr4 = ' aa aaaaa baa ' ## Note spaces at start and end
tststr4 = aa aaaaa baa
octave:42> regexp (tststr4, "[^a]aa[^a]")
ans =
1 11
... so it doesn't catch the pattern at start and end of line.
William Krekeler wrote:
Sorry I responded too soon. To only get the second shorter set of aa the
following works.
indexes = regexp (tststr, "a{2,}");
indexes2 = regexp (tststr, "a{3,}");
desiredIndex = setxor( indexes, indexes2 )
Thanks.
octave:37> indexes = regexp (tststr3, "a{2,}")
indexes =
1 4 11
octave:38> indexes2 = regexp (tststr3, "a{3,}")
indexes2 = 4
octave:39> desiredIndex = setxor( indexes, indexes2 )
desiredIndex =
1 11
... so your trick seems to catch the right ones.
I'm a bit concerned that this solution will be too time-consuming.
What I actually want to do is check format strings from a spreadsheet
which potentially may involve several tens of thousands strings (which
are not much more complicated than the example above).
The following script:
tic
tststr3 = 'aa aaaaa baa';
for ii=1:50000
indexes = regexp (tststr3, "a{2,}");
indexes2 = regexp (tststr3, "a{3,}");
desiredIndex = setxor( indexes, indexes2 );
endfor
toc
...says:
"Elapsed time is 33.1 seconds."
A script using strfind:
tic
tststr3 = 'aa aaaaa baa';
for ii=1:50000
idx1 = strfind (tststr3, 'aa');
idx2 = strfind (tststr3, 'aaa');
idx2 = strfind (tststr3, 'aaaa');
idx2 = strfind (tststr3, 'aaaaa');
endfor
toc
...gives:
"Elapsed time is 2.27 seconds."
so a solution based on strfind() seems more preferrable.
(findstr() takes about 68 secs, I suspect it is based on regexp()).
Thanks again,
Philip
Re: regexp question, Sergei Steshenko, 2011/12/05