[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: AW: [Grammatica-users] Tokenizer problem
From: |
Per Cederberg |
Subject: |
Re: AW: [Grammatica-users] Tokenizer problem |
Date: |
Fri, 01 Jul 2005 17:43:27 +0200 |
Some types of validations are better performed outside of the
grammar, and I think this is one of those cases. So I'd solve
this with the following grammar:
%tokens%
SPACE = " "
STRING = <<[A-Z]+>>
%productions%
Input = StationId " " Phenomens ;
StationId = STRING ;
Phenomens = STRING ;
Then I'd perform the various semantical validations in the
Analyzer:
public class MyAnalyser extends WhateverAnalyzer {
protected Node exitStationId(Production node)
throws ParseException {
String value = ((Token) node.getChildAt(0)).getImage();
if (value.length() != 4) {
throw new ParseException(ParseException.ANALYSIS_ERROR,
"station id must be 4 chars",
node.getStartLine(),
node.getStartColumn());
}
node.addValue(value);
}
...
}
It just makes more sense to move this type of domain
knowledge out of the grammar, as you can then add new
station id:s without having to change the grammar itself.
If all you wish to parse are simple strings similar to these,
one might also consider just using a regular expression (as
Grammatica is really more suitable for more complex grammars):
[A-Z]{4} [A-Z]{2}([A-Z]{2}([A-Z]{2})?)?
Cheers,
/Per
On fri, 2005-07-01 at 16:04 +0200, HECKHAUSEN Ralf wrote:
> Well, I now understand how the problem is caused. Please give me a hint how
> to solve the following:
>
> %header%
> GRAMMARTYPE = "LL"
> %tokens%
> STATION_ID = <<[A-Z]{4}>>
> SPACE = " "
> DZ = "DZ"
> RA = "RA"
> SN = "SN"
> %productions%
> INPUT = STATION_ID SPACE PHENOMEN [PHENOMEN [PHENOMEN]];
> PHENOMEN = DZ | RA | SN; // real list has 22 items
>
> ABCD DZRASN is not parsed correctly, because DZRA is returned as STATION_ID
> token.
> Defining
> "PHENOMEN = DZ | RA | SN | STATION_ID;"
> is not a solution in this case, as it would allow invalid input.
>
> Defining STATION_ID as LETTER LETTER LETTER LETTER would fail on stations
> containig on of the phenomens.
>
> Cheers.
> Ralf