Re: several messages

bison-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: several messages

From:	Akim Demaille
Subject:	Re: several messages
Date:	Wed, 16 Sep 2009 16:32:16 +0200


Le 8 sept. 2009 à 01:42, Joel E. Denny a écrit :

My grammar has:
- 185 rules
- 104 terminals
- 58 nonterminals
How many S/R and R/R conflicts? I'm also interested in conflictsthat are
resolved by prec/assoc declarations.

There are no remaining conflicts. The updated log has the figures onsolved conflicts:


- 1143 in lalr/ielr
- 64650 in full lr

It's good to see that the IELR computation time is barely worse thanLALR.
Because IELR didn't split any states, the output files are of course
exactly the same. However, that also means there may not have beenmuch
for IELR to do, and so the timings may not reveal much about IELR's
complexity. The number of conflicts would provide some insighthere...
but maybe I'm getting off topic a little.

(By the way, I've been meaning to add a test case for Bison that makes
sure the LALR and IELR parser tables are exactly the same for
parse-gram.y.  If they ever become different, we'll need to fix
parse-gram.y or start using IELR.)


Good point :)

canonical-lr-all

Execution times (seconds)
IELR(1) Phase 3 : 0.33 (10%) usr 0.01 (10%) sys 0.00( 0%) wallIELR(1) Phase 4 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00( 0%) wallconflicts : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00( 0%) wallparser action tables : 2.19 (65%) usr 0.01 (10%) sys 128.00(100%) walloutputing parser : 0.16 ( 5%) usr 0.02 (20%) sys 0.00( 0%) wallrunning m4 : 0.65 (19%) usr 0.05 (50%) sys 0.00( 0%) wallfreeing : 0.01 ( 0%) usr 0.01 (10%) sys 0.00( 0%) wall
TOTAL                 :   3.36             0.10           128.00
Bison 3.49s real  3.37s user  0.11s system
G++   7.85s real  6.76s user  0.76s system
state 12941
2.7M /tmp/ugrammar.cc
52K /tmp/ugrammar.hh
1.6M /tmp/ugrammar.o
25M /tmp/ugrammar.output
I'm glad to see that most of the extra time does not fall in theoriginalparser table computation. It just takes a really long time tocompress
and transform those huge tables into the output code.


Yes, indeed.

canonical-lr-consistent

Execution times (seconds)
IELR(1) Phase 3 : 0.36 ( 2%) usr 0.01 ( 4%) sys 0.00( 0%) wallIELR(1) Phase 4 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00( 0%) wallconflicts : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00( 0%) wallparser action tables : 16.38 (91%) usr 0.05 (18%) sys 0.00( 0%) walloutputing parser : 0.34 ( 2%) usr 0.08 (29%) sys 0.00( 0%) wallrunning m4 : 0.89 ( 5%) usr 0.13 (46%) sys 0.00( 0%) wallfreeing : 0.02 ( 0%) usr 0.01 ( 4%) sys 0.00( 0%) wall
TOTAL                 :  18.01             0.28             0.00
Bison 18.64s real  18.02s user  0.30s system
G++   10.63s real  9.15s user  0.96s system
state 12941
5.6M /tmp/ugrammar.cc
52K /tmp/ugrammar.hh
2.3M /tmp/ugrammar.o
27M /tmp/ugrammar.output
Wow. What a jump. I can see why you don't want this. I have toadmitthat I haven't spent much time exploring the combination ofcanonical-lr
and different values of lr.default-reductions.  Maybe it's bad that I
defaulted it to "accepting"? I just didn't want anyone to arguethat they
specified canonical LR but didn't get it.

I have no strong opinion about this. After all, someone reading thedoc about lr.type should be ready to read the section about lr.default-reductions. We have to warn them. My figures are public, if you meanto use them in some form in the documentation, that would be fine withme.

canonical-lr-accepting

Execution times (seconds)
reader : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00( 0%) wallIELR(1) Phase 3 : 0.37 ( 1%) usr 0.01 ( 2%) sys 0.00( 0%) wallIELR(1) Phase 4 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00( 0%) wallconflicts : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00( 0%) wallparser action tables : 46.08 (95%) usr 0.21 (34%) sys 0.00( 0%) walloutputing parser : 0.61 ( 1%) usr 0.12 (20%) sys 0.00( 0%) wallrunning m4 : 1.28 ( 3%) usr 0.26 (43%) sys 0.00( 0%) wallfreeing : 0.01 ( 0%) usr 0.01 ( 2%) sys 0.00( 0%) wall
TOTAL                 :  48.38             0.61             0.00
Bison 49.65s real  48.39s user  0.63s system
G++   13.04s real  11.08s user  1.03s system
state 12941
8.5M /tmp/ugrammar.cc
52K /tmp/ugrammar.hh
3.2M /tmp/ugrammar.o
31M /tmp/ugrammar.output
Ouch.  Well, more motivation to steer clear of canonical LR.

:)

It would be nice to extend the *.output file with various metrics
about the grammar.


Agreed.


Will add this to TODO.

The syntax error on "1 1" behaves as expected with both acceptingand
consistent (i.e., mute since there are too many possibilities ;).


If you could unmute it (by adding a few more possibilities in the
skeleton), it would be interesting to see if canonical LR gives the
same
expected tokens list.


I was interested in extending it temporarily to compare the expected

tokens for your specific grammar. Of course, I also would like tosee it

fixed in general....


I'll keep this as another TODO item :)

One way out of the fixed arity would be to use "note: " lines, asAlex
suggested it (and I'm slowly changing my mind on this regard).
I was thinking we were both only opposed to dropping the warninglabel.I like the idea of a special function for formatting submessages.I'm not
so sure about the "note: " prefix though.


I changed my mind when I saw how current GCCs talk to me.

$ cat /tmp/foo.cc
void foo (float*) {}
void foo (int*){}

int
main()
{
  foo(1);
}
$ g++ -Wall /tmp/foo.cc
/tmp/foo.cc: In function 'int main()':
/tmp/foo.cc:7: error: call of overloaded 'foo(int)' is ambiguous
/tmp/foo.cc:1: note: candidates are: void foo(float*) <near match>
/tmp/foo.cc:2: note:                 void foo(int*) <near match>

and Emacs correctly understands this as a single error.

However, for the generated parser rather than for bison, strings fedtoyyerror currently don't contain newlines. I'm not sure howimportant orhow long-standing that convention has been. If we move to a %error-report
or something like that, that might be a good opportunity to break the
convention... or to let the user take control.

I think we should still keep separated the action of building theerror message, and "sending" the error message. But indeed, maybeit's wrong. It's clearly "nicer", more modular, but on the other handit may push on the user useless memory-allocation issues that do noteven exist if she uses directly fprintf in her %error-report.


Still...  I feel uncomfortable with the fusion of both.

My suspicion would have been that disabling default reductionsalmostalways does more good than harm, but I think we need moreexploration.
So if we can't trust static tables, should we actually try each
possible token one after the other, and see if one can be shifted at
some point? Of course without touching the stack, that can bequite apain to write :( But it should not be really different from ourerror-
recovery code.
You're reading my mind.  :)  I think we've realized for a while that
%error-verbose is not perfect.


Yes, I agree.

However, there's one problem with your suggestion. By the time thesyntaxerror is reported, it's too late. The parser may have alreadyperformed
some default reductions


Ah!  Of course!

Thus, immediately upon fetching the token (it's fine if the fetchcomesafter performing some default reductions in consistent states), theparserneeds to determine whether the token can eventually be shifted.That is,
the parser first tries the parse without performing semantic actions,
without manipulating the value and location stacks, and withoutlosing theoriginal state stack. If it successfully reaches a shift, it thenrepeats
the parse on all the normal stacks.  If unsuccessful instead, it
immediately reports a syntax error and then repeats the explorationforevery token, as you suggested, so it can report what tokens wereexpected.During each such exploration (during parsing or at error recovery),theparser need not duplicate the normal parser state stack in order toavoid
losing the original context.  Instead, it keeps a temporary stack that
records only the exploratory states that are not already recorded onthenormal parser stack. It also keeps a pointer into the normal parserstackin order to record the base of the temporary stack. That pointermay move
during exploratory reductions.  I've implemented all this already in
yacc.c, and it functions correctly so far.  I'm just tidying up memory
management at the moment, and I need to do more testing.


That sounds exciting!

At worst, LAC should no more than double the parsing time. That is,even
though it's similar to GLR or backtracking, it tries only one possible
parse on any token up to its shift before performing the parse forreal or
reporting a syntax error.  (Also unlike GLR and backtracking, we still
resolve conflicts statically.  The parser is still deterministic.)  My
suspicion is that, given the time for file I/O, lexical analysis, and
semantic actions, the total run time of the parser will not beaffected
severely by this doubling.  Of course, building a syntax error message
will take longer than double, but I'm not as worried about that.When I
finish implementing, I'll try collecting some statistics.


Now I see what you're using bench.pl for :)

Another interesting point is that, with LAC, IELR should alwaysperformexactly the same semantic actions as canonical LR even forsyntactically
incorrect sentences.


Nice feature!

 Previously, IELR could only claim that for
syntactically correct sentences. (The same can be said for LALR aslongas there are no mysterious conflicts, but, unless there are noconflicts
at all, that's hard to verify without an IELR implementation anyway.)
One can even imagine using LAC in conjunction with a push parser to
implement, for example, tab completions.


I also had this in mind.

Keep in mind that, with LAC, the only possible difference between
lr.default-reductions=all or =consistent for any fixed value oflr.typeshould be parser tables sizes and performance (that is, possiblymany morereductions to explore before permitting normal parsing), which wouldbe
interesting to collect statistics on.

It would also be nice to explore the merits at runtime of the tablecompressions. I remember that some other tools (maybe Yacc++) offersthis feature. Space is probably no longer an issue, so maybe therewould be some added runtime performances to use raw tables.

BTW, I've not yet done any homework to see if any of this has beendonebefore. Now that I write it down, it seems like such a simplepremise, somaybe it has been done many times. I should probably haveresearched itbefore banging out an implementation. Maybe there's a better wayalready
discovered.

It seems that indeed, many many results were obtained about LRparsers, but many of them are now somewhat forgotten. Yet, there areso many papers out there that I don't know what's the right means tofetch only the relevant ones.

Do I also understand that we neither have a superset nor a subset,
just some fuzzyset?


For expected tokens reported in syntax error messages, that's correct.
It's unsettling, isn't it?


Yes :)

I can send ugrammar.y privately, if you wish.


I'd like to see that.  Thanks!

Will do right afterwards. Please, no dirty jokes about my grammar :)I wish it was nicer in many places...

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Fix %error-verbose for conflicts resolved by %nonassoc., Akim Demaille, 2009/09/02
- Re: several messages, Joel E. Denny, 2009/09/03
  - Re: several messages, Akim Demaille, 2009/09/07
    - Re: several messages, Joel E. Denny, 2009/09/07
    - Re: several messages, Akim Demaille <=
    - Re: several messages, tys lefering, 2009/09/17
    - Re: several messages, Akim Demaille, 2009/09/17
    - Re: several messages, Joel E. Denny, 2009/09/26
    - Re: several messages, Joel E. Denny, 2009/09/27

Prev by Date: Re: [PATCH] yysyntax_error: make it manage its own memory.
Next by Date: [PATCH] lalr1.cc: factor the yytranslate_ invocation in make_SYMBOLS.
Previous by thread: Re: several messages
Next by thread: Re: several messages
Index(es):
- Date
- Thread