octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Backslashes in regular expression replacement patterns


From: Rik
Subject: Backslashes in regular expression replacement patterns
Date: Tue, 16 Oct 2012 09:29:16 -0700

10/16/12

All,

I have been trying to address this bug
(https://savannah.gnu.org/bugs/?37092) where backslashes in regular
expressions are not handled well.  It turns out that the existing code is
terribly naive.  It looks for a '$' character to indicate a replacement
token and does not do any sort of escape processing.  Thus,

regexprep ('a', '(\w)', '\$1') => 'a'
regexprep ('a', '(\w)', '\\$1') => '\a'

I have a changeset that fixes all this up, but I had to rework the
replacement string function in regexp.cc in liboctave.  Part of the effort
involved delaying escape pattern processing for '\' and '$' characters
until the replacement string function.  With the new code in place, I get
the following

regexprep ('a', '(\w)', '\$1') => '$1'
regexprep ('a', '(\w)', '\\$1') => '\a'

which seems accurate.  The question now arises of how to handle
double-quoted strings which will have one round of escape sequence
processing done by the interpreter and another round by regexprep for the
special characters '\' and '$'.  So for the following example, what is it
reasonable for Octave to return?

regexprep ('a', '(\w)', "\\$1")

Should it be '\a' or '$1'? 

In the first case, one would get the same result from using a replacement
pattern of '\\$1' or "\\$1" which is odd because usually single quotes and
double quotes are not interchangeable.  On the other hand, Perl would
return '\a' for the regexprep above.

--Rik


reply via email to

[Prev in Thread] Current Thread [Next in Thread]