help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: C preprocessor


From: Christian Schoenebeck
Subject: Re: C preprocessor
Date: Fri, 14 Aug 2020 11:11:49 +0200

On Donnerstag, 13. August 2020 07:49:52 CEST Giacinto Cifelli wrote:
> Hi all,
> 
> I am wondering if it is possible to interpret a c-preprocessor (the second
> preprocessor, not the one expanding trigrams and removing "\\\n") or an m4
> grammar through bison, and in case if it has already been done.
> I think  this kind of tool does not produce a type-2 Chomsky grammar,
> rather a type-1 or even type-0.

The common classification of languages like C I think is "attributed context-
free language", and it is in chomsky-2.

If you just need to handle the preprocessor part, then all you need is a lexer
with stack enabled. A parser (e.g. Bison) only becomes relevant if you also
need to process the aspects that come after the preprocessor.

> Any idea how to build something like an AST from it?
> 
> The purpose would be to use in a text editor, to know how to format for
> example a block between #if/#endif (according to the condition, for example
> could be greyed out if false),

Just to give you a basic idea how this can be done e.g. with Flex, *very*
roughly (i.e. you have to complete it yourself):


/* enable functions yy_push_state(), yy_pop_state(), yy_top_state() */
%option stack

/* inclusive scanner conditions */
%s PREPROC_BODY_USE
/* exclusive scanner conditions */
%x PREPROC_DEFINE PREPROC_DEFINE_BODY PREPROC_IF PREPROC_BODY_EAT

DIGIT    [0-9]
ID       [a-zA-Z][a-zA-Z0-9_]*

%%

 /* #define <name> <body> */

<*>"#define"[ \t]* {
        yy_push_state(PREPROC_DEFINE, yyscanner);
        yyextra->token = PreprocessorToken(yytext);
        return PREPROC_TOKEN_TYPE;
}

<PREPROC_DEFINE>{ID} {
   yy_pop_state(yyscanner);
        yy_push_state(PREPROC_DEFINE_BODY, yyscanner);
        yyextra->macro_name = yytext;
        yyextra->token = PreprocessorToken(yytext);
        return PREPROC_TOKEN_TYPE;
}

<PREPROC_DEFINE_BODY>[^$]* {
        yy_pop_state(yyscanner);
        yyextra->token = PreprocessorToken(yytext);
        yyextra->macro_table[yyextra->macro_name] = yytext;
        return PREPROC_TOKEN_TYPE;
}


 /*
    #if <condition>
         <body>
    #endif
 */

<*>#if[ \t]* {
        yy_push_state(PREPROC_IF, yyscanner);
        yyextra->token = PreprocessorToken(yytext);
        return PREPROC_TOKEN_TYPE;
}

<PREPROC_IF>{ID} {
        yy_pop_state(yyscanner);
        if (evaluate(yyextra->macro_table[yytext]))
                yy_push_state(PREPROC_BODY_USE, yyscanner);
        else
                yy_push_state(PREPROC_BODY_EAT, yyscanner);
        yyextra->token = PreprocessorToken(yytext);
        return PREPROC_TOKEN_TYPE;
}

<PREPROC_BODY_EAT>.* /* eat up code block filtered out by preprocessor */

<*>.*"#endif" {
    yy_pop_state(yyscanner);
    yyextra->token = PreprocessorToken(yytext);
    return PREPROC_TOKEN_TYPE;
}

 /* Language keywords */

if|else|const|switch|case|int|unsigned {
        yyextra->token = KeywordToken(yytext);
        return KEYWORD_TOKEN_TYPE;
}

 /* String literal */

\"[^"]*\" {
    yyextra->token = StringLiteralToken(yytext);
    return STRING_LITERAL_TYPE;
}

 /* Number literal */

{DIGIT}+("."{DIGIT}+)? {
    yyextra->token = NumberLiteralToken(yytext);
    return NUMBER_LITERAL_TYPE;
}

 /* Other tokens */

<*>. {
    yyextra->token = OtherToken(yytext);
    return OTHER_TOKEN_TYPE;
}

%%


Best regards,
Christian Schoenebeck





reply via email to

[Prev in Thread] Current Thread [Next in Thread]