[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grammatica-users] More Grammatica Help
From: |
Per Cederberg |
Subject: |
Re: [Grammatica-users] More Grammatica Help |
Date: |
Tue, 23 Mar 2004 12:51:24 +0100 |
On Tue, 2004-03-23 at 08:31, Brandon Silverstein wrote:
> Per,
>
> Thanks for you help; your explanations cleared up the main problem I
> was having. Now I am running into another problem:
>
> inherent ambiguity in production 'CStyleComment' at position 2
> starting with token "*"
>
> I have a feeling that my grammar file is not formatted correctly or
> that you might have explained what to do in the previous e-mail. In
> any case, I have attached the current file I am working with to see if
> anything sticks out as being wrong to anyone. How do I solve these
> inherent ambiguities? Any help is appreciated.
An inherent ambiguity is reported by Grammatica whenever two
look-ahead sets contain unresolvable overlaps. That is, if we
have two alternatives in a production Grammatica must be able
to choose which to use based on a limited number of look-ahead
tokens. Imagine a case like this:
A = "b"* B
| "b"* C ;
There is no way Grammatica can calculate a limited look-ahead
set for each of the alternatives here. It is impossible to know
beforehand the number of repetitions of the "b" token. The way
to resolve these ambiguities is to rewrite the production, like
this:
A = "b"* BOrC ;
BOrC = B
| C ;
In your case, what Grammatica is trying to tell you is that the
CStyleComment contains such an ambiguity at position 2:
CStyleComment = "/*" [ CStyleBody ] CStyleEnd;
^
here
Problem is that CStyleBody can start with an unlimited number
of "*", while the following CStyleEnd can also match the same.
It is thus impossible for Grammatica to know if the next tokens
indicate the optional CStyleBody or the CStyleEnd production.
The simple resolution in this case would be to redefine
CStyleEnd like this:
CStyleEnd = "*/" ;
BUT, this is not a good way to use Grammatica. Instead,
comments, indentifiers and similar a MUCH BETTER represented
as tokens. This is how I'd define the comment tokens for
example (a bit tricky to read, but anyway):
DOC_COMMENT = <</\*\*([^*]|\*[^/])*\*/>>
C_COMMENT = <</\*([^*]|\*[^/])*\*/>>
CPP_COMMENT = <<//.*>>
Note that DOC_COMMENT must preceed C_COMMENT as both will
match the documentation comments (and in that case order is
important). For this reason, string tokens should ALWAYS be
placed before regular expression tokens, to avoid that a
regular expression token takes precedence.
Some other things I noted in the jml.grammar:
* The "nonAtPlusStar" token matches long pieces of text
(including newlines and whitespace) which would almost
always be the longest match (and thus the token found).
* The "@since" and similar tags inside the documentation
comment are better parsed separately. It is possible to
write a grammar to do it all, but it would not look very
pretty.
If you need more examples, have a look at the various grammar
files distributed with Grammatica (in src/grammar and
test/src/grammar). Also try the "<grammar> --tokenize <file>"
command to check that your grammar splits the input into the
expected tokens.
Cheers,
/Per