help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Proposal for the Java code too large problem


From: Di-an JAN
Subject: Proposal for the Java code too large problem
Date: Mon, 6 Oct 2008 07:30:48 -0700 (PDT)

In Java, each function is limited to 64K of bytecode, arrays are initalized
by using bytecodes to assign to each element, and all static initialization
are done by a single function, so with a complex grammar, it's possible
for all the parser tables to exceed the 64K static initializer limit
and get a "code too large" error from javac.

The solution depends on how big the grammar is:

Level 1: Use static initializers, limited to 64K bytecode total.
Most efficient for most grammars.  Currently used by Bison.

Level 2: Use one function for each array initialization, limited to 64K
bytecode per array.  Costs a few extra function calls at initialization.
Used by Byacc/J.

Level 3: Load tables from string literals or a file, in binary or text
format, limited to 2GB elements per array, 4GB UTF8 bytes per string
literal, and available memory.  Using a text format is inefficient
but should requires no changes outside the Java parser skeleton.
Using string literals instead of files makes the generated parser
self-contained, but may require encoding into 16-bit characters.
Beaver uses a single encoded string for its parser tables.

Which of the following solutions do people prefer?  Or something else?
I think they're all fairly easy and I'm willing to try implementing them.

Solution 2: Goto Level 2 unconditionally.

Solution 3: Select strategy the the following interface:

%define parser_tables "small" // Level 1, default
%define parser_tables "medium"        // Level 2

Optional:

%define parser_tables "large" // Level 3 in string literals
%define external_parser_tables "[FILENAME]"
%define yypact_table "..."    // per table strategy
...

(At first I wanted to choose names that are short, descriptive, and fairly
easy to remember, but then, small/medium/large is even easier to remember
and allows the underlying implementation to change.  Also, it's like the
old x86 memory model: small is one 64K code segment, medium is multiple
64K code segments, (compact is multiple 64K data segments, large is multiple
64K code and data segments) and huge is larger-than-64K data segments.)

Comments?

Di-an Jan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]