[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
A lex->flex porting nightmare
From: |
Hans-Bernhard Broeker |
Subject: |
A lex->flex porting nightmare |
Date: |
Thu, 5 Oct 2000 19:12:07 +0200 (MET DST) |
Hello, everyone
I have taken on a battle which I seem not to be able to win.
The scanner is the one of the now open-sourced classical Unix tool
'cscope'. The scanner is supposed to take C files as input and spit out
major parts of them in a slightly different format that makes the
'cscope.out' database.
The scanner, given its origin, is written with AT&T lex in mind. It does
work in 'flex -l' mode, but it's inacceptably slow (factor of 6 or so
slower than 'lex', on the same scanner.l).
I've got quite far into cutting out several things the 'flex -p' report
didn't like at all, but now I'm facing one that puzzles me completely, and
which I vaguely suspect to be due to a bug in 'flex': the same scanner
that works quite nicely in 'flex -B8' mode will overwrite the contents of
'yytext' saved by yymore(), if it's built with 'flex -B8 -Cf' or '...-CF'.
The thrashing of yytext[] seems to be connected to the scanner backing up,
or so the -d output seems to signify.
The scanner is way to large to be posted here, but if anyone really wants
to look at it: it's available at SourceForge, in the 'cscope' project.
It does quite a lot of things that seem (to me) rather unusual, and may
well be very bad flex coding:
*) massive 'goto'ing from one action into the middle of some other
*) almost all actions use a yymore() call, except the ones that
matched a newline in the input. This is in trying to use yytext[]
as a buffer that the whole source line can be kept in, until the
newline is found and a modified version of that line of code may
have to be copied to the output file.
*) it assumes that the contents of yytext stay intact after a
yymore(), even if the scanner return()ed a token in between.
*) it doesn't work a bit if you use it in '%pointer' mode,
but I haven't quite isolated the reason for that, yet.
*) it overrides the input() method with it's own, which is a hand-coded
micro-scanner to erase all comments from the input, but it does
not override unput(), but calls it from various places.
*) In two cases, yyleng is modified and yymore() called, in the same
action. I know it shouldn't do this, according to the docs, but OTOH,
the scanner failure happens even with input that doesn't ever trigger
any of those two actions.
Any enlightenment you could provide? Or should I just give up on speeding
up this scanner?
Hans-Bernhard Broeker (address@hidden)
Even if all the snow were burnt, ashes would remain.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- A lex->flex porting nightmare,
Hans-Bernhard Broeker <=