bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17722: Makefile rule fix and cleanup patches


From: behoffski
Subject: bug#17722: Makefile rule fix and cleanup patches
Date: Sat, 07 Jun 2014 20:45:06 +0930
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 06/07/14 16:26, Paul Eggert wrote:
Jim Meyering wrote:

why wouldn't we want to keep the build rules simple and the same
for everyone?

We wouldn't want to do that if it entailed bloated and opaque executables on 
all platforms, or if it entailed too-complicated C programs on all platforms.  
We should be able to avoid both problems.

Perhaps we could supply a different build recipe for platforms that lack a 
working shell.  Such platforms can't run shell scripts, after all, so maybe it 
would suffice to supply a text file or a comment containing build instructions 
for such platforms.



I've already anticipated at least part of this in my untangle'd code,
as the parser only knows about the lexer via two functions:

    lex ()      -- get the next lexical token; and
    exchange () -- Exchange information between the lexer and the
                   parser, according to the opcodes defined in
                   proto_lexparse.h:

    typedef enum proto_lexparse_opcode_enum
    {
    /*  Possible future opcode: PROTO_LEXPARSE_OP_GET_LOCALE,  entire
        locale from uselocale/duplocale */
      PROTO_LEXPARSE_OP_GET_IS_MULTIBYTE_LOCALE,
      PROTO_LEXPARSE_OP_GET_REPMN_MIN,
      PROTO_LEXPARSE_OP_GET_REPMN_MAX,
      PROTO_LEXPARSE_OP_GET_WIDE_CHAR_LIST_MAX,
      PROTO_LEXPARSE_OP_GET_WIDE_CHARS,
      PROTO_LEXPARSE_OP_GET_DOTCLASS,
      PROTO_LEXPARSE_OP_GET_MBCSET,
} proto_lexparse_opcode_t;

The addresses of the lex and exchange functions are not known at link
time; they are explicitly provided at runtime, via the function
fsaparse_lexer in fsaparse.h.  [The function description omits to
mention the exchange parameter; this is a glitch on my part.]

    /* Receive a lexer function, plus lexer instance context pointer, for use by
       the parser.  Although not needed initially, this plug-in architecture may
       be useful in the future, and it breaks up some of the intricate
       connections that made the original dfa.c code so daunting.  */
    extern void
    fsaparse_lexer (fsaparse_ctxt_t *parser,
                    void *lexer_context,
                    proto_lexparse_lex_fn_t *lex_fn,
                    proto_lexparse_exchange_fn_t *lex_exchange_fn);

It would be trivial to build a "fgrep-only" lexer module that
merely fetches the next character in the input pattern (perhaps
using mbrtowc), and return it as either a WCHAR list, as a
"self-token", or as a charclass for case-folded unibyte chars.

Some of the opcodes above (e.g. REPMN*, MBCSET) would become no-ops
in the fgrep lexer, as it would never emit those tokens.  At present,
case folding for wide characters is handled via the WCHAR token
allowing a list of equivalents to be specified alongside the
original; the parser ORs these together in the parse tree.

While the selection could be deferred until runtime (link both
versions and bind when the parser/lexer link is established), the
linker could be tailored to exclude the larger lexer (fsalex) in the
fgrep version (including avoiding e.g. the parse_bracket_exp
infrastructure).   This leads to both easier-to-read code and a
smaller executable.

In the distant future, new tokens, such as:
    STRING
    STRING_CASE_INSENSITIVE
    WCHAR_STRING
could be added as part of a rework of the token structure; in this
case, the use of CAT in the parser could be reduced, and the work
needed to extract "musts" from the tree could be reduced.  However,
I'd prefer to do this as part of a complete revamp of the token
structure, not by tacking more options onto an existing structure.

cheers,

behoffski
Programmer, Grouse Software






reply via email to

[Prev in Thread] Current Thread [Next in Thread]