[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] Grammar improvement
From: |
Valentin Tolmer |
Subject: |
[bug-gawk] Grammar improvement |
Date: |
Tue, 10 Sep 2013 19:24:28 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130804 Thunderbird/17.0.8 |
Hello,
My name is Valentin Tolmer, and with the Google Summer of Code, I've
worked on precedence and associativity in Bison grammars. More
precisely, my work consisted in introducing a new syntax to declare
partial order precedence relationships (instead of the total order that
is the custom for now). Although my patch hasn't been accepted yet, the
modifications introduced lead to particularly interesting improvements
in gawk's grammar, in the organization of the precedence relationships.
Here's a description of what it does and how it affects gawk's grammar,
so that you might tell me what you think and maybe suggest some
improvements.
The basic idea of the improvement is to limit the over specification of
precedence relationships, that might lead to unwanted conflicts
resolutions. For example, if a new arithmetics operators is introduced
and prioritized, but it has another meaning (see for example < in the
description below), the other meaning will be prioritized too, maybe
accidentally. By reducing the amount of useless precedence relationships
declared, the grammar will be safer.
The code for this improvement can be found on bison's git repo, in the
partialorder branch.
When analyzing gawk's grammar with Bison (a modified bison that outputs
the used precedence relationships), it is possible to single out
basically 3 groups of prioritized tokens in the gawk grammar :
- The arithmetics operators (+, -, /, *, etc...)
- The comparison operators (<, >, RELOP, MATCH_OP, etc...)
- '(' and LEX_LENGTH, that have no link with any other token
There are two notable exceptions :
- < is used not only as a comparison operator, but also as an input
redirection, which leads to it being compared to almost every
arithmetics operator.
- CONCAT_OP is used only in conjunction with +, - and /, because they
are the only three operators that can start an expression (e. g. +7, -2
or regexp /[abc]*).
These precedence relationships can be very well expressed with the new
system I introduced to declare precedence relationships. The updated
grammar looks like this (only the token precedence declaration part is
shown, as it is the only one changed) :
%gprec comp_op {
%precedence ASSIGNOP
%right '?' ':'
%left LEX_OR
%left LEX_AND
%precedence LEX_IN
%left MATCHOP
%nonassoc RELOP '<' '>'
}
%gprec arith {
%left '+' '-'
%left '*' '/' '%'
%precedence UNARY
%right '^'
}
%gprec {
%precedence CONCAT_OP
}
%gprec {
%precedence LEX_LENGTH
%precedence '('
}
%precr '<' < arith
%precr CONCAT_OP < '-' '+' '/'
Here, %gprec starts a group of token precedence, in which the order is
total (every token can be compared to every other, as was the case with
the previous system), but there is no link between tokens outside the group.
Then additional precedence is defined with %precr statements. For
example, '<' < arith means that the token '<' is of lower precedence
than every token of the group arith.
If you weren't aware of this, a few versions ago, the keyword
%precedence has been introduced to enable the declaration of a token
without associativity, but with precedence. Every useless associativity
has been removed from this grammar (another new feature of bison).
In the attachments you'll find the bison output of which precedence
links are declared in this grammar: in black those that are declared and
used to resolve conflicts, and in red those that are declared but useless.
Needless to say, the generated parser is the same, the only information
removed was some useless one.
Please get back to me with any comment, suggestion or remark. Keep me in
the CC field, as I am not subscribed to the newsletter.
Valentin Tolmer
prec_report.pdf
Description: Adobe PDF document
- [bug-gawk] Grammar improvement,
Valentin Tolmer <=