[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grammatica-users] Trying to disambiguate a grammar
From: |
Per Cederberg |
Subject: |
Re: [Grammatica-users] Trying to disambiguate a grammar |
Date: |
Sun, 18 Feb 2007 08:55:17 +0100 |
User-agent: |
Thunderbird 1.5.0.9 (Macintosh/20061207) |
Hi Craig,
Most of the grammar below lends itself well to
tokenization with regular expressions. Consider
the following tokens:
BIN_NUMBER = <<0(B|b)[0-1]+>>
OCT_NUMBER = <<0[0-7]*>>
DEC_NUMBER = <<[1-9][0-9]*>>
HEX_NUMBER = <<0(x|X)[0-9A-Fa-f]+>>
MINUS = "~"
DOT = "."
E = <<(e|E)>>
With the help of these you can rewrite the rest of
the grammar:
Int = ["~"] NumberToken ;
NumberToken = BIN_NUMBER
| OCT_NUMBER
| DEC_NUMBER
| HEX_NUMBER ;
Float = ["~"] DEC_NUMBER "." [DEC_NUMBER] [Exponent] ;
Exponent = E ["~"] DEC_NUMBER
If you are working to expand this into a full
programming language grammar, you'll run into
issues with the E token. As the tokenizer is
not context sensitive, it will always return
the longest matching token.
Also, I many grammars the definition of float
and integer decimal number are both built into
the same DEC_NUMBER token. For a full language
that is probably the better solution, leaving
some validation controls to the analyzer stage.
Here I opted for something more similar to your
original grammar.
Cheers,
/Per
Craig Ugoretz wrote:
Hello,
I am new to grammatica (and parsers in general) and I have a
grammar that I am trying to disambiguate. Can anyone lend any advice?
Hopefully, this should get me on the right track with the rest of my
work... I apologize for the notation - it is EBNF, but nonstandard (and
non-grammatica).
<int> ::= ['~'] <nzdigit> { <digit> }
| ['~'] O { <octdigit> }+
| ['~'] ('0x' | '0X') { <hexdigit> }+
| ['~'] ('0b' | '0B') { <bindigit> }+
<float> ::= ['~'] { <digit> }+ '.' { <digit> } { ('e' | 'E') ['~'] {
<digit> }+
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<nzdigit> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<octdigit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
<hexdigit> ::= <digit> | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'A' | 'B' |
'C' | 'D' | 'E' | 'F'
<bindigit> ::= 0 | 1
Can proper tokenization alone with regular expressions lend itself to
disambiguating the grammar? This was a tactic that I tried, but was not
familar enough with regular expressions to make progress.
------------------------------------------------------------------------
_______________________________________________
Grammatica-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/grammatica-users