bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: rename variant (was: Rename variant and lex_symbols options)


From: Akim Demaille
Subject: Re: rename variant (was: Rename variant and lex_symbols options)
Date: Fri, 14 Dec 2012 16:28:33 +0100

Hi all!

Le 11 oct. 2012 à 15:59, Akim Demaille <address@hidden> a écrit :

> Previous discussions with Joel have established that they are not
> related, so let's treat them separately.  Simplest first: %define
> lex-symbols.

This is the continuation of the previous message: how to give a name
to the features introduced by "%define variant".  The issues are: to
find a good name, to be general enough to cover we might have in the
future, and to avoid focusing only on the case at hand (C++).

What is summarized below is also the result of discussion with friends
(Alexandre Duret-Lutz, Roland Levillain, and Théophile Ranquet).
Comments are *most* welcome!

Below by "variant" I don't exactly mean what others call discriminated
union, "sum type", tagged union, etc.: union types that know the type
of what they currently store.  In the case of Bison what we actually
need is just a means to store objects, raw memory for a fixed number
of object types, since the nature of what is stored there is known
elsewhere (e.g., from the state number).  Consider that it simply
means "unions on steroids for C++".

As of today, if you write "%define variant", then instead of

  %union { int ival; };
  %token <ival> NUMBER

you can write

  %token <int> NUMBER

and the C++ skeleton then uses a variant.  The #1 advantage is that it
allows to use any kind of C++ object:

  %token <std::string> STRING

while C++ forbids things like

  %union { std::string sval; }

Behind %define variant, there are actually several orthogonal issues:

i. type tag vs type name

  whether %token <…> and %type <…> are provided with type tags (ival,
  the usually approach), or type names (int), and

ii. how values are stored

  whether YYSTYPE is a union, or variant.

  Actually it could also be a struct; sometimes in C++, when I have to
  use an old Bison, I do use a struct because I don't care of losing a
  few bytes by moving too many things, while I'm very happy to be able
  to store a genuine std::string, or std::list etc.  Sometimes also, I
  want to pass additional information for every symbol - imagine if
  Bison did not support location, a struct with location and value
  would do very well.

iii. type definition

  whether the type is synthesized by bison (e.g., from %union), or
  provided by the user (today this is acheived by #define YYSTYPE).


So we could perfectly have in C

  %token <int> NUMBER
  %token <const char*> STRING

and still use a union, leaving to bison the generation of union
YYSTYPE.

Recently, with Paul, we also realized users cannot tell that there are
no semantical values in their grammar.  They can use whatever type
they want with #define YYSTYPE, but "#define YYSTYPE void" will not
work to specify that there are no expected values.

So to summarize, it would be nice to have a means to
- leave the definition of YYSTYPE completely to the user, included
  an empty type
- provide a means to require an implementation as variant, or union.

We could have two different variables to control the type: one to
specify its "kind" or "support" (is it a union/struct/variant
generated by Bison, is it empty, or is it user provided), and another
to specify its name.

E.g.

  %define api.value.kind  (""|union|variant|struct|type)

and if api.value.kind is "type" (or "user", or "custom"), then
api.value.type would be the type name.

Alternatively we can try to use a single variable, but with special
values (variant|union|%union|"") to distinguish the special cases.
Java uses "stype" to denote the user provided root for all the
semantic values.  Recently in Bison we prefer to use longer, qualified
names (which should not be a problem for Java users :), so I'd
rather use "api.value.type" for symmetry with "api.location.type".

api.value.type = "%union"
-------------------------

the user will use %union to define the type tags, bison generates
YYSTYPE.

  // Original interface.
  // by default: %define api.value.type "%union"
  %union
  {
    int ival;
    char* sval;
  };
  %token <ival> NUMBER
  %token <sval> STRING

api.value.type = "union"
------------------------

the user use <> to define types (not type tags), bison generates
YYSTYPE as union.

  // We provide types, Bison implements the union.
  %define api.value.type union
  %token <int> NUMBER
  %token <char *> STRING


api.value.type = "variant"
--------------------------

the user use <> to define types (not type tags), bison generates
YYSTYPE as a variant (for C++ only, synonym of "union" for C).

  // We provide types, Bison implements the union.
  %define api.value.type variant
  %token <int> NUMBER
  %token <char *> STRING


api.value.type = ""
-------------------

There are no semantic value at all in the grammar.

api.value.type = other
----------------------

The user uses type tags in <>, and bison defines YYSTYPE to be the
provided name ("other" here).

  // Let the user define her value type.
  %code require
  {
    union values // (or struct, or whatever).
    {
      int ival;
      char *sval;
    };
  };
  %define api.value.type "values"
  %token <ival> NUMBER
  %token <sval> STRING

In this more complex example the user wants to build a richer semantic
value type structure:

  // A more complex example of user provided type.
  %code require
  {
    struct value_type
    {
      char* original; // For instance every token features the
                      // string it was scanned from.
      union
      {
        int ival;
        char *sval;
      } u;
    };
  };
  %define api.value.type value_type
  %token <u.ival> NUMBER
  %token <u.sval> STRING

I would have like to be able to cover the last example by leaving to
bison the charge of generating the inner type (that of "u"), so that
it would be equivalent to write:

  %token <int> NUMBER
  %token <char*> STRING

but I have not found a proper API for this.  Since, besides, so far,
there was no demand for this feature, I have not tried harder.



Alexandre suggested that we don't need "variant", but that in C++
mode, we would simply map "%define api.value.type union" to using
variants.  This is a nice idea, as it simplifies the user interface by
simply stating that, in C++, if you want to really use unions, just
use %union as is usual.  I like that idea.  It's also true that it
is not exactly a genuine variant, just "better unions for C++".

I'd very much appreciate comments and/or suggestions.

Cheers!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]