On Sunday, July 27, 2014, David Caldwell <
address@hidden> wrote:
On 7/27/14 6:03 AM, David Kastrup wrote:
> "Stephen J. Turnbull" <address@hidden> writes:
>
>> Sure, you can do a lot for readability as PCRE or Python regexps have
>> done, but regexps are unreadable almost by design, and those regexp
>> syntaxes benefit from rawstrings, too. Almost anything (that doesn't
>> involve changing the meaning of existing legal programs) that improves
>> readability of regexps is worthwhile.
>>
>> Rawstrings are cheap and effective.
>
> When rawstrings are supported, it becomes more expedient to recognize
> things like \n and \t, probably also \f in regexps (\b is already
> taken). At the current point of time, they just evaluate to n and t.
> That makes input of tabs and newlines in raw strings a nuisance and a
> potential source of errors.
>
> It's not actually an issue with rawstrings as such, but rather of their
> use within regexps.
Why not, then, skip rawstrings completely and go directly to a regular
_expression_ reader: #r// (or even just #//) instead of #r""?
Then you can add whatever semantics are needed for good regexp reading
(ie, let '\n', '\t', and others get escaped in the string reading, but
allow '\(' to go through unescaped). This will be just as easy to
implement as raw strings.
Languages like _javascript_, Perl, Ruby, Bash, and Groovy have shown that
having a special support for regexps at a language level is a very
effective way of dealing with them. Plus it opens the door to
extensions: #r//p for PCRE/Perl syntax[1] or #r//x for more readable
regexps[2], etc.
I think using rawstrings is too generic an answer to the problem. Given
that so much of Emacs's functionality is reliant an regular expressions,
it makes sense to design something specifically for them. Doing that
means they can be tailored and tweaked for maximum functionality without
worrying about possible other usages that people might come up (which
will undoubtedly happen with rawstrings).
-David
[1] And practically every other language on the planet. Really, it seems
like only Emacs is left in the dark ages of basic POSIX regexps where
'(' means literal paren and not matching.
[2] Another Perl feature, it allows whitespace and comments in regexps,
for much improved readability. See http://perldoc.perl.org/perlre.html#/x