help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

A lex->flex porting nightmare


From: Hans-Bernhard Broeker
Subject: A lex->flex porting nightmare
Date: Thu, 5 Oct 2000 19:12:07 +0200 (MET DST)

Hello, everyone

I have taken on a battle which I seem not to be able to win.

The scanner is the one of the now open-sourced classical Unix tool
'cscope'. The scanner is supposed to take C files as input and spit out
major parts of them in a slightly different format that makes the
'cscope.out' database.

The scanner, given its origin, is written with AT&T lex in mind. It does
work in 'flex -l' mode, but it's inacceptably slow (factor of 6 or so
slower than 'lex', on the same scanner.l).

I've got quite far into cutting out several things the 'flex -p' report
didn't like at all, but now I'm facing one that puzzles me completely, and
which I vaguely suspect to be due to a bug in 'flex': the same scanner
that works quite nicely in 'flex -B8' mode will overwrite the contents of
'yytext' saved by yymore(), if it's built with 'flex -B8 -Cf' or '...-CF'.
The thrashing of yytext[] seems to be connected to the scanner backing up,
or so the -d output seems to signify.

The scanner is way to large to be posted here, but if anyone really wants
to look at it: it's available at SourceForge, in the 'cscope' project.
It does quite a lot of things that seem (to me) rather unusual, and may
well be very bad flex coding: 

*) massive 'goto'ing from one action into the middle of some other
*) almost all actions use a yymore() call, except the ones that
   matched a newline in the input. This is in trying to use yytext[]
   as a buffer that the whole source line can be kept in, until the
   newline is found and a modified version of that line of code may
   have to be copied to the output file.
*) it assumes that the contents of yytext stay intact after a
   yymore(), even if the scanner return()ed a token in between.
*) it doesn't work a bit if you use it in '%pointer' mode,
   but I haven't quite isolated the reason for that, yet.
*) it overrides the input() method with it's own, which is a hand-coded
   micro-scanner to erase all comments from the input, but it does
   not override unput(), but calls it from various places.
*) In two cases, yyleng is modified and yymore() called, in the same
   action. I know it shouldn't do this, according to the docs, but OTOH,
   the scanner failure happens even with input that doesn't ever trigger
   any of those two actions.

Any enlightenment you could provide? Or should I just give up on speeding
up this scanner?

Hans-Bernhard Broeker (address@hidden)
Even if all the snow were burnt, ashes would remain.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]