help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp question


From: Philip Nienhuis
Subject: Re: regexp question
Date: Tue, 06 Dec 2011 18:52:31 +0100
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Sergei, Wiliam,

2 answers in one post:

Sergei Steshenko wrote:
I guess you need 'aa' surrounded by not 'a'. Octave uses PCRE; I am not 
familiar with nuances of Octave PCRE usage; in Perl I would write the regular 
expression this way:

[^a]aa[^a]

and if/when it matches, it returns pointer to the character preceding the 'aa' 
substring, i.e. in case of 'baab' it should return pointer to the first 'b'.

Thanks, Sergei. I already tried this and found it'll work, but unfortunately not in a more complicated situation:

octave:35> tststr3 = 'aa aaaaa baa'     ## Patterns at start & end
tststr3 = aa aaaaa baa
octave:36> regexp (tststr3, "[^a]aa[^a]")
ans = [](1x0)                           ## Hey......

but
octave:41> tststr4 = ' aa aaaaa baa '   ## Note spaces at start and end
tststr4 =  aa aaaaa baa
octave:42> regexp (tststr4, "[^a]aa[^a]")
ans =
    1   11

... so it doesn't catch the pattern at start and end of line.


William Krekeler wrote:
Sorry I responded too soon. To only get the second shorter set of aa the 
following works.

indexes = regexp (tststr, "a{2,}");
indexes2 = regexp (tststr, "a{3,}");
desiredIndex = setxor( indexes, indexes2 )

Thanks.

octave:37> indexes = regexp (tststr3, "a{2,}")
indexes =
    1    4   11

octave:38> indexes2 = regexp (tststr3, "a{3,}")
indexes2 =  4
octave:39> desiredIndex = setxor( indexes, indexes2 )
desiredIndex =
    1   11

... so your trick seems to catch the right ones.


I'm a bit concerned that this solution will be too time-consuming.
What I actually want to do is check format strings from a spreadsheet which potentially may involve several tens of thousands strings (which are not much more complicated than the example above).
The following script:

tic
tststr3 = 'aa aaaaa baa';
for ii=1:50000
  indexes = regexp (tststr3, "a{2,}");
  indexes2 = regexp (tststr3, "a{3,}");
  desiredIndex = setxor( indexes, indexes2 );
  endfor
toc

...says:
"Elapsed time is 33.1 seconds."

A script using strfind:

tic
tststr3 = 'aa aaaaa baa';
for ii=1:50000
   idx1 = strfind (tststr3, 'aa');
   idx2 = strfind (tststr3, 'aaa');
   idx2 = strfind (tststr3, 'aaaa');
   idx2 = strfind (tststr3, 'aaaaa');
endfor
toc

...gives:
"Elapsed time is 2.27 seconds."

so a solution based on strfind() seems more preferrable.
(findstr() takes about 68 secs, I suspect it is based on regexp()).


Thanks again,

Philip


reply via email to

[Prev in Thread] Current Thread [Next in Thread]