[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: C preprocessor
From: |
Christian Schoenebeck |
Subject: |
Re: C preprocessor |
Date: |
Fri, 14 Aug 2020 11:11:49 +0200 |
On Donnerstag, 13. August 2020 07:49:52 CEST Giacinto Cifelli wrote:
> Hi all,
>
> I am wondering if it is possible to interpret a c-preprocessor (the second
> preprocessor, not the one expanding trigrams and removing "\\\n") or an m4
> grammar through bison, and in case if it has already been done.
> I think this kind of tool does not produce a type-2 Chomsky grammar,
> rather a type-1 or even type-0.
The common classification of languages like C I think is "attributed context-
free language", and it is in chomsky-2.
If you just need to handle the preprocessor part, then all you need is a lexer
with stack enabled. A parser (e.g. Bison) only becomes relevant if you also
need to process the aspects that come after the preprocessor.
> Any idea how to build something like an AST from it?
>
> The purpose would be to use in a text editor, to know how to format for
> example a block between #if/#endif (according to the condition, for example
> could be greyed out if false),
Just to give you a basic idea how this can be done e.g. with Flex, *very*
roughly (i.e. you have to complete it yourself):
/* enable functions yy_push_state(), yy_pop_state(), yy_top_state() */
%option stack
/* inclusive scanner conditions */
%s PREPROC_BODY_USE
/* exclusive scanner conditions */
%x PREPROC_DEFINE PREPROC_DEFINE_BODY PREPROC_IF PREPROC_BODY_EAT
DIGIT [0-9]
ID [a-zA-Z][a-zA-Z0-9_]*
%%
/* #define <name> <body> */
<*>"#define"[ \t]* {
yy_push_state(PREPROC_DEFINE, yyscanner);
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}
<PREPROC_DEFINE>{ID} {
yy_pop_state(yyscanner);
yy_push_state(PREPROC_DEFINE_BODY, yyscanner);
yyextra->macro_name = yytext;
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}
<PREPROC_DEFINE_BODY>[^$]* {
yy_pop_state(yyscanner);
yyextra->token = PreprocessorToken(yytext);
yyextra->macro_table[yyextra->macro_name] = yytext;
return PREPROC_TOKEN_TYPE;
}
/*
#if <condition>
<body>
#endif
*/
<*>#if[ \t]* {
yy_push_state(PREPROC_IF, yyscanner);
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}
<PREPROC_IF>{ID} {
yy_pop_state(yyscanner);
if (evaluate(yyextra->macro_table[yytext]))
yy_push_state(PREPROC_BODY_USE, yyscanner);
else
yy_push_state(PREPROC_BODY_EAT, yyscanner);
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}
<PREPROC_BODY_EAT>.* /* eat up code block filtered out by preprocessor */
<*>.*"#endif" {
yy_pop_state(yyscanner);
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}
/* Language keywords */
if|else|const|switch|case|int|unsigned {
yyextra->token = KeywordToken(yytext);
return KEYWORD_TOKEN_TYPE;
}
/* String literal */
\"[^"]*\" {
yyextra->token = StringLiteralToken(yytext);
return STRING_LITERAL_TYPE;
}
/* Number literal */
{DIGIT}+("."{DIGIT}+)? {
yyextra->token = NumberLiteralToken(yytext);
return NUMBER_LITERAL_TYPE;
}
/* Other tokens */
<*>. {
yyextra->token = OtherToken(yytext);
return OTHER_TOKEN_TYPE;
}
%%
Best regards,
Christian Schoenebeck
Re: C preprocessor, Hans Åberg, 2020/08/14