octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regexp cleanup


From: Laurent Hoeltgen
Subject: Re: Regexp cleanup
Date: Thu, 04 Jul 2013 09:37:04 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130509 Thunderbird/17.0.6

On 07/03/2013 09:57 PM, PhilipNienhuis wrote:
Rik-4 wrote
7/3/13

All,

Does anyone know if the following expression is legal in Matlab?

[S, E, TE, M, T, NM, SP] = regexp ("John Davis\nRogers, James",
'(?
<first>
\w+)\s+(?
<last>
\w+)|(?
<last>
\w+),\s+(?
<first>
\w+)')

The issue is with the repeated use of a named capture buffer across an
alternation operator.  PCRE, which we use underneath for regular
expressions, does not support non-unique capture names in a pattern.
Octave currently works around this by renaming the capture buffers.
However, the logic at the far end to parse the output of PCRE and return
results to Octave is very complex and creaky.  I re-wrote the back end
routine in util/regexp.cc and I can now, at least, follow what the code is
doing.  The re-write also solves the following existing bugs (I said it
was
creaky).

38778: wrong return value for regexp
38616: memory leak
38149: wrong tokens returned

So, depending on what Matlab does, would it be okay to drop support for
this esoterica?  I'm pretty tired of trying to work it out at this point.

--Rik
Matlab r2013b prerelease does (after changing double quote to single quote,
and removing empty lines):

[S, E, TE, M, T, NM, SP] = regexp ('John Davis\nRogers, James',
'(?<first>\w+)\s+(?<last>\w+)|(?<last>\w+),\s+(?<first>\w+)')
S =
      1    12
E =
     10    25
TE =
     [2x2 double]    [2x2 double]
M =
     'John Davis'    'nRogers, James'
T =
     {1x2 cell}    {1x2 cell}
NM =
1x2 struct array with fields:
     first
     last
SP =
     ''    '\'    ''
...so it seems Matlab thinks this is valid.

Philip



--
View this message in context: 
http://octave.1599824.n4.nabble.com/Regexp-cleanup-tp4655163p4655172.html
Sent from the Octave - Maintainers mailing list archive at Nabble.com.

Hi,

Matlab R2012a returns the same result as above.

Regards,
Laurent


reply via email to

[Prev in Thread] Current Thread [Next in Thread]