[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grammatica-users] Regex token order
From: |
Per Cederberg |
Subject: |
Re: [Grammatica-users] Regex token order |
Date: |
Mon, 28 Feb 2011 07:50:13 +0100 |
It works like this (as flex):
1. Longest matching token first.
2. On equal length, use the token defined previously in the grammar.
So, in your case, don't make your TEXT token a repetitive regexp.
Match a single char only.
Cheers,
/Per
On Monday, February 28, 2011, Drew Vogel <address@hidden> wrote:
> I would expect the token definition order to matter, based on my experience
> with similar tools like flex. I must be doing something wrong.
> This is the test file I am trying to parse:
> --------------------------------------------------
>>email< Enter your email address:
>
> This is my test grammar:--------------------------------------------------
> %header%GRAMMARTYPE = "LL"
> %tokens%RCARET = ">"LCARET = "<"ITEM_NAME = <<[a-zA-Z][a-zA-Z0-9]+>>
> TEXT = <<.+>>
> %productions%Item = ItemDecl TEXT;ItemDecl = RCARET ITEM_NAME LCARET ;
>
> This is the error I get from grammatica:
> --------------------------------------------------java -jar
> grammatica-1.5.jar Q.grammar --parse test.qParse tree from test.q:
> Error: in test.q: line 1: unexpected token ">email<" <TEXT>, expected ">"
>
> If I remove the TEXT token definition and the reference in the Item
> production, the remaining grammar does properly match the first line and I
> get a parse error at the new line character (as expected). Why does the
> introduction of my TEXT token override those previously-matching tokens, even
> though it is listed last in the %tokens% section?
>
>
>
> On Sun, Feb 27, 2011 at 11:49 PM, Oliver Bock <address@hidden> wrote:
>
>
>
>
>
>
>
> I had to do a similar thing, but putting the more specific tokens
> first in %tokens% worked for me. From my grammar:
>
> ON = "ON"
> VARNAME = <<address@hidden(address@hidden@])?>>
>
> The text "ON" could match both these tokens, but for me ON matches,
> not VARNAME. I suggest you cut your example down into a very simple
> grammar (like the above).
>
>
> Oliver
>
> On 28/02/2011 4:37 PM, Drew Vogel wrote:
> If I have two regex tokens A and B and A is a subset
> of B, how do I disambiguate them such that A will always be tried
> before B? The order they appear in the %tokens% section does not
> seem to affect this and I did not see an example of this in the
> documentation.
>
>
>
> The parser I am trying to construct is for a template-like
> language with commands embedded in text. Thus I have a "text"
> token regex <<.+>> to match everything not otherwise
> matched as a command, but I only want to match it after all
> other token regex patterns have been tried.
>
>
>
> Drew Vogel
>
>
>
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
>
>
>
>
>
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
>
>