help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Proposals for various changes to the Java parser


From: Paolo Bonzini
Subject: Proposals for various changes to the Java parser
Date: Tue, 28 Oct 2008 18:59:16 +0100
User-agent: Thunderbird 2.0.0.17 (Macintosh/20080914)

> Ideas for changing the generated Java parsers.  I can implement most of
> these.  Comments on the interface and semantics would be appreciated.

This is great material, thanks.

> 1. The parser class can be declared public and/or abstract by using the
> ``%define public'' and ``%define abstract'' directives.  I can add the
> other Java class modifiers with ``%define final'' and ``%define strictfp''
> and ``%define annotations "@..."'' directives (plural, since %define's
> are not combined, but must be specified in the same %define) to be complete.
> Or, we can have a single ``%define parser_class_modifiers "..."'' to
> specify all modifiers together.  Or both.
> Implemented ``%define final/strictfp''.
> 3. Add ``%define extends "Super"'' and ``%define implements "Interfaces"''.
> Implemented.
> 4. Add ``%define lex_throws "Exceptions"'' to parse() and not use it as
> the default of ``%define throws "Exceptions"''.

I already have your patch for these, right?  For (1), Annotations would
be nice, but not high priority of course.

> 2. The parser class name currently defaults to ``YYParser'' (actually,
> ``b4_prefixParser'', but I already submitted a patch for that bug).
> Do people prefer to make it match the Java file name instead?  Of course,
> characters not allowed in Java names must be removed or replaced by ``_''.

Yeah, that's nice since the filename is known at m4 time.

> 5. Work around Java's ``code too large'' limitation for large parser tables.
> http://lists.gnu.org/archive/html/help-bison/2008-10/msg00005.html
> Bison could try to estimate how much bytecode is needed and choose
> generate code accordingly, but it depends on the actual Java compiler
> and the amount of user static initialization, so that's not a good idea.
> The syntax in the text (not appendix) of the ``Java Language Specification,
> Second Edition'' is just at the limit, depending on how it's converted from
> ENBF and conflicts removed.  The awk, cim, and pic grammars from
> tests/existing.at are all under the limit.
> Implemented ``%define parser_tables "small/medium/large"'' except docs
> and tests.  Only need changes to the Java skeleton.

I think switching unconditionally to one-initializer-per-table is the
easiest approach by far, and gives 99% of the benefit.

> 6. Currently ``%union'' is silently ignored

I thought it gave an error. :-)

> , and Java types are used as
> the TYPE in ``$<TYPE>'', ``%token<TYPE> ...'' and ``%type<TYPE> ...''.
> I propose to interpret these ``<TYPE>'' as a field name in ``%union'',
> interpreting it as a Java type if no such field name exists.
> First, this matches the behavior of C/C++ parsers, even though Java doesn't
> actually have union types.  Also makes it easier to convert from C/C++.

It is a bit a waste of memory...

> Second, this allows the use of generic types since ``<TYPE>'' does not
> allow ``>'' in TYPE.  [...]
> Bison doesn't plays nice with generics: '>' is not allowed
> in ``<TYPE>'' by the syntax, and using a generic ``%define stype'' gives
> a ``generic array creation'' error, and at least one place needs to
> m4-quote commas in ``<TYPE>'' (fails with Map/*String,String*/).

... and if the Bison lexer was adjusted to allow balanced angle-brackets
in TYPE instead, it would be possible to define the token stack as
Object[], right?  But I did not understand the comment about m4-quoting
commas.

> 7. Should we make sure ``$$'' have the right type?  For example:

Yes, would be nice.

> 8. Remove ``public static final boolean bison = true;''  This corresponds
> to ``#define YYBISON 1'' in C parsers, which can be used for conditional
> compilation.  There is no conditional compilation in Java, though you
> probably can use reflection.  Might as well use ``bisonVersion'' anyway.
> Or keep it for compatibility, and document as ``public'' interface.

Removing it is okay for me.  It was meant for reflection, but
bisonVersion seems good enough.

> 9. Document ``bisonVersion'' and ``bisonSkeleton'' as part of the ``public''
> interface.

Yes, thanks.

> 10. If ``%verbose-error'' is not used, do not generate code for it.

Not top priority, but I would not oppose this.  I thought javac could in
principle elide it, but maybe it does not because native methods can set
final and private fields.

> Document that ``errorVerbose'' can be changed given ``%verbose-error''.
> Or make it ``yyErrorVerbose'' and provide getter and setter like ``%debug''.

Latter option is definitely better.

> 11. If ``%debug'' or -t/--debug is not used, do not generate code for it.
> How to turn debugging on and off is already documented.

Makes sense as well.

> 12. Don't generate token names when not needed.  If ``%token-table''
> or -k/--token-table is used, also generate the following function:
> 
> 
> /** Returns the token number (for returning from yylex) for NAME.
>     NAME does not have to be quoted, but when unquoted, it is first
>     matched with a double-quoted literal string token, then a single-quoted
>     character token type, then a named token type.  */
> public int getTokenNumber(String name);
> 
> 
> /** Returns the token number (for returning from yylex) for NAME.
>     NAME may not have to be quoted, and only match a double-quoted
>     literal string token from the grammar.  */
> public int getStringTokenNumber(String name);
> 
> 
> By the way, the example in the ``Interface / Lexical / Calling Convention''
> node of the manual is wrong.  It gives the internal token number, not the
> ones returned by yylex.  It needs to pass though ``yytoknum''.
> C/C++ probably should define these functions too.

Ok.

> 13. Make the yyerror functions public.  Otherwise, it's a member function
> of the lexer, which is not available if defined with ``%code lexer {...}''.
> Even without ``%code lexer {...}'', users shouldn't have to save a reference
> to the lexer just to call its yyerror when it's already saved in the parser.

Ok.

> 14. Allow user-defined location class.  Currently, you can only change its
> name by using ``%define location_type''.  We need this to print abbreviated
> ranges, where common file names and line numbers are not printed.  Or to
> use encoded int or long for Position.
> Perhaps ``%define location_type'' can be changed to mean that the default
> class should not be generated.  Or use ``%define no_default_location_type''.

So, the location_type would be either "Location" (with the default code)
or an external user-defined type (and likewise for position_type).
Seems indeed better, but in that case I would change the default
implementations' names to YYLocation and YYPosition.

> 15. Defines a default position class with line and column, as in C, unless
> ``%define position_type'' is used.  This is not backward compatible though.
> To be fully backward compatible, we need ``%define default_position_type''
> and ``%define no_default_location_type''.  Ugly.

Don't worry (too much) about backwards compatibility.

> 16. Support ``%printer''.  Even though the virtual toString() method is
> used to print symbols, symbols of the same semantic type may have to be
> displayed differently (for example, see Bison's grammar of itself) and
> we shouldn't have to define new (sub)classes just to print differently.
> Also, we may want something different from the ``natural'' toString(),
> for example, quoted characters and strings (can't even override the
> ``toString()'' of Character and String because they're final classes).

%printer was a bit in flux, IIRC, while I was writing the skeleton.
Overriding toString() is a good idea, the printer would be just an
expression returning String, right (the default being "$$.toString()"
hence)?

> 17. Allow gcj version < 4.3 to be used in the testsuite.  Currently
> (in gnulib/m4/javacomp.m4), for gcj version < 4.3, only
> ``-source 1.4 && -target 1.4'' and ``-source 1.3 && -target 1.4''
> are allowed.  According th the comments there, these gcj really does
> target 1.4.  I guess we can relax that, or use -source 1.3 -target 1.4
> for bison's configure.ac ``gt_JAVACOMP([1.3], [1.4])''.
> I check that ``javac -source 1.3 -target 1.4'' works with JDK 1.6.
> By the way, gcj 3.4.4 is the ``system'' compiler on Cygwin.

Good.

> 19. On Cygwin, ``make check'' fails when using ``java'' because it generates
> CR/LF output while autotest uses LF. Maybe do a ``sed -e 's/\r$//''' on the
> output?

Yes.

Thanks,

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]