octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Regexp cleanup


From: Rik
Subject: Regexp cleanup
Date: Wed, 03 Jul 2013 09:32:10 -0700

7/3/13

All,

Does anyone know if the following expression is legal in Matlab?

[S, E, TE, M, T, NM, SP] = regexp ("John Davis\nRogers, James",
'(?<first>\w+)\s+(?<last>\w+)|(?<last>\w+),\s+(?<first>\w+)')

The issue is with the repeated use of a named capture buffer across an
alternation operator.  PCRE, which we use underneath for regular
expressions, does not support non-unique capture names in a pattern. 
Octave currently works around this by renaming the capture buffers. 
However, the logic at the far end to parse the output of PCRE and return
results to Octave is very complex and creaky.  I re-wrote the back end
routine in util/regexp.cc and I can now, at least, follow what the code is
doing.  The re-write also solves the following existing bugs (I said it was
creaky).

38778: wrong return value for regexp
38616: memory leak
38149: wrong tokens returned

So, depending on what Matlab does, would it be okay to drop support for
this esoterica?  I'm pretty tired of trying to work it out at this point.

--Rik


reply via email to

[Prev in Thread] Current Thread [Next in Thread]