help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode Bison


From: Hans Aberg
Subject: Re: Unicode Bison
Date: Fri, 12 Apr 2002 15:58:53 +0200

At 11:13 +0200 2002/04/12, Akim Demaille wrote:
>Hans, don't you think it is a good property to limit the number of
>tools which require additional complexity to handle such a delicate
>problem?  In the current framework, i.e., scanner | parser, don't you
>think it is a desirable property for the reliability of the whole
>program, that Unicode does not spread all around the place?

I think it depends on how difficult the thing is to implement:

The first feature, altering the starting point of token numbering out of
the Unicode range seems simple enough.

As for admitting access to Unicode characters from within the grammar (.y)
file, that may not be difficult either: One puts the (Unicode) character
part of the current yytranslate[] table into a first component sorted array
of pairs (u, t), where u is the Unicode character, and t is the
yytranslate[] translation. Then defines something like
  #define YYTRANSLATE(x) \
    ((unsigned)(x) <= unicode_max? yyunicode_translate(x) : \
       (unsigned)(x) <= token_max ? yytranslate[x-unicode_max] : 21))
where yyunicode_translate(x) is a binary search in the Unicode character
translation table.

This scheme make the yytranslate table more compact, as all the character
entries no longer are present. It becomes slower, but as for Unicode, there
is no way to avoid that, and a binary search will most likely be quick
enough.

There will be other changes; how much is required can only be assessed by
trying. But the reasoning above suggests it does not require that much
work: If somebody gets curious, I think one might try it.

>As far as I'm concerned, I've said and explained it all, I don't plan
>to continue this discussion.

Sounds like a contradiction: By your lines, you have just started the
discussion. :-)


  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]