Re: regexp question

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp question

From:	Philip Nienhuis
Subject:	Re: regexp question
Date:	Tue, 06 Dec 2011 18:52:31 +0100
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Sergei, Wiliam,

2 answers in one post:

Sergei Steshenko wrote:

I guess you need 'aa' surrounded by not 'a'. Octave uses PCRE; I am not 
familiar with nuances of Octave PCRE usage; in Perl I would write the regular 
expression this way:

[^a]aa[^a]

and if/when it matches, it returns pointer to the character preceding the 'aa' 
substring, i.e. in case of 'baab' it should return pointer to the first 'b'.

Thanks, Sergei. I already tried this and found it'll work, butunfortunately not in a more complicated situation:


octave:35> tststr3 = 'aa aaaaa baa'     ## Patterns at start & end
tststr3 = aa aaaaa baa
octave:36> regexp (tststr3, "[^a]aa[^a]")
ans = [](1x0)                           ## Hey......

but
octave:41> tststr4 = ' aa aaaaa baa '   ## Note spaces at start and end
tststr4 =  aa aaaaa baa
octave:42> regexp (tststr4, "[^a]aa[^a]")
ans =
    1   11

... so it doesn't catch the pattern at start and end of line.


William Krekeler wrote:

Sorry I responded too soon. To only get the second shorter set of aa the 
following works.

indexes = regexp (tststr, "a{2,}");
indexes2 = regexp (tststr, "a{3,}");
desiredIndex = setxor( indexes, indexes2 )


Thanks.

octave:37> indexes = regexp (tststr3, "a{2,}")
indexes =
    1    4   11

octave:38> indexes2 = regexp (tststr3, "a{3,}")
indexes2 =  4
octave:39> desiredIndex = setxor( indexes, indexes2 )
desiredIndex =
    1   11

... so your trick seems to catch the right ones.


I'm a bit concerned that this solution will be too time-consuming.

What I actually want to do is check format strings from a spreadsheetwhich potentially may involve several tens of thousands strings (whichare not much more complicated than the example above).

The following script:

tic
tststr3 = 'aa aaaaa baa';
for ii=1:50000
  indexes = regexp (tststr3, "a{2,}");
  indexes2 = regexp (tststr3, "a{3,}");
  desiredIndex = setxor( indexes, indexes2 );
  endfor
toc

...says:
"Elapsed time is 33.1 seconds."

A script using strfind:

tic
tststr3 = 'aa aaaaa baa';
for ii=1:50000
   idx1 = strfind (tststr3, 'aa');
   idx2 = strfind (tststr3, 'aaa');
   idx2 = strfind (tststr3, 'aaaa');
   idx2 = strfind (tststr3, 'aaaaa');
endfor
toc

...gives:
"Elapsed time is 2.27 seconds."

so a solution based on strfind() seems more preferrable.
(findstr() takes about 68 secs, I suspect it is based on regexp()).


Thanks again,

Philip

[Prev in Thread]

Current Thread

[Next in Thread]

regexp question, PhilipNienhuis, 2011/12/05
- RE: regexp question, William Krekeler, 2011/12/05
- RE: regexp question, William Krekeler, 2011/12/05
  - Re: regexp question, Philip Nienhuis <=
    - Re: regexp question, Sergei Steshenko, 2011/12/06
    - Re: regexp question, Philip Nienhuis, 2011/12/06
    - Re: regexp question, Sergei Steshenko, 2011/12/07
- Re: regexp question, Sergei Steshenko, 2011/12/05

Prev by Date: Re: Search string in cell string?
Next by Date: Re: Search string in cell string?
Previous by thread: RE: regexp question
Next by thread: Re: regexp question
Index(es):
- Date
- Thread