bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15212: 24.3.50; c++-mode doesn't support raw string literals


From: Ivan Andrus
Subject: bug#15212: 24.3.50; c++-mode doesn't support raw string literals
Date: Tue, 31 May 2016 08:22:07 -0600

On May 29, 2016, at 3:36 PM, Alan Mackenzie <acm@muc.de> wrote:

Hi, Ivan.

I've now got a patch, which I'd be grateful if you could try out, both
to see if there are any bugs, and also to get your general impression.
I think there are one or two bugs left in the code, and it needs tidying
up quite a lot.  So this won't be the final version.

Awesome.  I’ll keep looking and let you know of any bugs I find.  

I did find one.  According to http://en.cppreference.com/w/cpp/language/string_literal the delimiter can contain any characters except parentheses, backslash and spaces.  Using square brackets confuses c++-mode though:

char brackets [] = R"0x22[(foobar)0x22[";

Now, I’ve never actually seen such a construct in the wild, but it would be good to fix it regardless.  The *Messages* buffer shows

File mode specification error: (invalid-regexp Unmatched [ or [^)
Error during redisplay: (jit-lock-function 1013) signaled (invalid-regexp "Unmatched [ or [^")

which seems to point to a missing regexp-quote, and indeed it thinks

  char bar [] = R"YYY*(bar)YYY";

is a valid string literal.

Moreover, I was somehow able to get it into a bad state where changing the delimiters wouldn’t update fontification.  I’ll see if I can come up with a recipe for how to reproduce it reliably.

The patch will work only on the savannah master branch - sorry, but it
depends on the fix to the "infrastructure" bug which I committed to
master only this morning (timezone +0200).

I'm also attaching a small test file which might interest you.

On Sat, May 28, 2016 at 02:40:45PM +0000, Alan Mackenzie wrote:
Thanks for the suggestion!  I've actually had an almost working solution
myself for just over a week.  Then I got confused with a bug in some CC
Mode "infrastructure" code.  Such is life!

The way I am fontifying these is thus:
(i) For a correctly terminated raw string, everything between the ( and )
inclusive gets string face, everything else just the default face:

           R"foo(bar)foo"
                ^^^^^
         font-lock-string-face.

I was wondering how this would work.  It’s a little weird that regular string delimiters are fontified with font-lock-string-face, but these aren’t.  But I think I like this way better since it’s much easier to confuse these delimiters with the contents of the string than normal string delimiters.

(ii) For a construct with a raw string opener, not correctly terminated,
I am putting warning face on the entire raw string opener, leaving the
rest of the string with string face, e.g.:

           R"baz(bar)foo"
           ^^^^^^
    font-lock-warning-face
                 ^^^^^^^^
      font-lock-string-face

Of course, that is subject to change if it doesn't work very well.

CC Mode doesn't actually use syntax-ppss and syntax-propertize-function,
since they don't allow enough control.  In particular, on a buffer
change, they erase all syntax-table text properties between point and end
of buffer which is wasteful; it is never necessary to erase these beyond
the next end of statement, and they are quite expensive to apply.

Anyhow, we should be able to have this implemented and the bug closed
pretty soon.

Thanks a bunch for your work on this.

-Ivan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]