bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bison scanner patch to fix POSIX incompatibilities, etc.


From: Paul Eggert
Subject: Re: Bison scanner patch to fix POSIX incompatibilities, etc.
Date: Tue, 5 Nov 2002 12:00:36 -0800 (PST)

> From: Akim Demaille <address@hidden>
> Date: 05 Nov 2002 09:30:47 +0100

> :)  Well, it is not written down, but if they consider C, then yes, I
> suppose they mean this weird thing too.

Pperhaps I should file an interpretation request, since it does seem a
little weird now that you mention it.  Please see the proposal at the
end of this message, which tries to cover all the bases.

Does anyone know what the C++ rules are with respect to trigraphs,
digraphs, and backslash-newline?  Does C++ have trigraphs?

> I do not understand why we would want to have trigraphs.

We don't, and that's why I left them out.

> I'm referring to using `\' to split the two characters of */ and /*.
> I mean, the code is ``more correct'' than it was before, but we
> might have written code that will just never be used.

True.

> The question was: did you ever see a \ splitting tokens in real code

In hand-written code I have seen it only for strings (and in the
International Obfuscated C code contest -- I won a prize there once
:-).  I have heard of it occuring in machine-generated code, because
some programs that generate Standard C code worry about compilers with
silly line length limits (which the standard does allow).

> Paul> While we're on the subject of POSIX conformance, I notice that
> Paul> we're not handling C digraphs correctly.  I'll throw that in
> Paul> too; might as well, while I'm at it, since it's easy.
> 
> Really, I see no interest at all in making it that perfect.  I bet big
> money that it is not Yacc portable

You're probably right.  I bet that none of this is Yacc portable, come
to think of it.

> all this is free complications that might make things uselessly more
> complex when we will "parse" other languages with different lexical
> structures.

Languages with different lexical structures will need different
lexical regimes anyway.  You can't parse Scheme with a C lexer.

Or perhaps you were thinking we could get away with parsing C++ and/or
Java with a C lexer?  That is plausible.

We can certainly construct horrible examples of C statements that no
Yacc parser could reasonably be expected to parse.  For example:

  %{
  #define CLOSE_BRACE }
  %}
  %%
  start: 'x' { { $$ = 0; CLOSE_BRACE } ;

Or how about this one?

  %{
  #define PERCENT_CLOSE_BRACE %}
  %}
  %%
  start: 'x' ;

POSIX says that both these are valid input to Yacc!  Clearly this is a
bug in the standard, but the question is how far the bug extends.

How about if we propose the following changes to the POSIX standard:

 * The two characters `%' and `}' cannot appear adjacent within %{
   ... %}, other than inside a comment or string literal.

 * C-language code must have properly nested occurrences of "{" and
   "}".  If braces are spelled any other way (e.g., via a macro like
   CLOSE_BRACE or via a digraph) then the resulting behavior is
   undefined.

 * Backslash-newline preprocessing does not apply to Yacc comments or
   literals, or to occurrences of pseudovariables like `$$'.

 * Similarly, trigraph preprocessing does not apply to Yacc comments,
   literals, or pseudovariables.

 * If the removal of a backslash-newline within C-language code would
   change the boundary of the containing comment, string literal, or
   character constant, the resulting behavior is undefined.

 * Similarly, if the replacement of a trigraph by its corresponding
   single character in C-language code would change the boundary of
   the containing comment, string literal, or character constant, the
   resulting behavior is undefined.

Would that be OK with you?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]