[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Grammatica-users] novice question: when more than one token match?
From: |
malcolm macaulay |
Subject: |
[Grammatica-users] novice question: when more than one token match? |
Date: |
Mon, 12 Apr 2004 21:04:35 +0100 |
Hi there,
I hope someone can help me with this and I apologize if this is a dumb
question.
I have a grammar where one more than one token can match (i.e. one or
more regex tokens are a subset of another regex token). If I am reading
the C# code correctly, Grammatica will return the *longest* token which
can be matched and if this does not match the production it will return
an error. This does not make sense to me as there may be a shorter token
match which is the correct one according to the grammar.
The document I want to parse looks like this:
Document to parse (the first part of a configuration file for a digital
power protection relay):
[DEVICE INFORMATION]
DEVICE NAME=750
COMMENT=some words
VERSION=500
My grammar:
%tokens%
WHITESPACE = <<[\s\n\r]+>> %ignore%
DEVICE_INFORMATION_HEADING = <<\[DEVICE INFORMATION\]>>
DEVICE_NAME = <<DEVICE NAME>>
EQUALS = "="
NUMBER = <<[0-9]+>>
DOT_STAR = <<.*>>
%productions%
Expression = DEVICE_INFORMATION_HEADING DEVICE_NAME EQUALS NUMBER
DOT_STAR;
When I test this grammar against the document I get:
Expression(2001)
DEVICE_INFORMATION_HEADING(1002): "[DEVICE INFORMATION]". Line: 1, col:
1
Error: in test.txt line 2:
Unexpected token "DEVICE NAME=750" <DOT_STAR>, expected <DEVICE_NAME>
I have read the C# code and I can see that it is matching both
<DEVICE_NAME> and <DOT_STAR>, but returning the <DOT_STAR> match as this
has the longer string of the two matches. This does not seem right;
surely it should return all matches and then refer to the productions to
determine which to use?
When I parse the document I am only interested in getting hold of
DEVICE_NAME and its value. I don't care about the remainder of the
document, hence the <DOT_STAR>.
If I change the grammar to (remove DOT_STAR):
%tokens%
WHITESPACE = <<[\s\n\r]+>> %ignore%
DEVICE_INFORMATION_HEADING = <<\[DEVICE INFORMATION\]>>
DEVICE_NAME = <<DEVICE NAME>>
EQUALS = "="
NUMBER = <<[0-9]+>>
%productions%
Expression = DEVICE_INFORMATION_HEADING DEVICE_NAME EQUALS NUMBER;
Then it correctly reads <DEVICE_NAME> <EQUALS> <NUMBER> (then gives me
an error on the third line as you would expect).
Any help would be greatly appreciated.
Per, thanks for making this parser generator available.
cheers
Malcolm Macaulay
- [Grammatica-users] novice question: when more than one token match?,
malcolm macaulay <=