Re: [Grammatica-users] Regex token order

grammatica-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grammatica-users] Regex token order

From:	Per Cederberg
Subject:	Re: [Grammatica-users] Regex token order
Date:	Mon, 28 Feb 2011 07:50:13 +0100

It works like this (as flex):

1. Longest matching token first.

2. On equal length, use the token defined previously in the grammar.

So, in your case, don't make your TEXT token a repetitive regexp.
Match a single char only.

Cheers,

/Per

On Monday, February 28, 2011, Drew Vogel <address@hidden> wrote:
> I would expect the token definition order to matter, based on my experience 
> with similar tools like flex. I must be doing something wrong.
> This is the test file I am trying to parse:
> --------------------------------------------------
>>email<  Enter your email address:
>
> This is my test grammar:--------------------------------------------------
> %header%GRAMMARTYPE = "LL"
> %tokens%RCARET = ">"LCARET = "<"ITEM_NAME = <<[a-zA-Z][a-zA-Z0-9]+>>
> TEXT = <<.+>>
> %productions%Item = ItemDecl TEXT;ItemDecl = RCARET ITEM_NAME LCARET ;
>
> This is the error I get from grammatica:
> --------------------------------------------------java -jar 
> grammatica-1.5.jar Q.grammar --parse test.qParse tree from test.q:
> Error: in test.q: line 1:    unexpected token ">email<" <TEXT>, expected ">"
>
> If I remove the TEXT token definition and the reference in the Item 
> production, the remaining grammar does properly match the first line and I 
> get a parse error at the new line character (as expected). Why does the 
> introduction of my TEXT token override those previously-matching tokens, even 
> though it is listed last in the %tokens% section?
>
>
>
> On Sun, Feb 27, 2011 at 11:49 PM, Oliver Bock <address@hidden> wrote:
>
>
>
>
>
>
>
>     I had to do a similar thing, but putting the more specific tokens
>     first in %tokens% worked for me.  From my grammar:
>
>     ON = "ON"
>     VARNAME = <<address@hidden(address@hidden@])?>>
>
>     The text "ON" could match both these tokens, but for me ON matches,
>     not VARNAME.  I suggest you cut your example down into a very simple
>     grammar (like the above).
>
>
>       Oliver
>
>     On 28/02/2011 4:37 PM, Drew Vogel wrote:
>     If I have two regex tokens A and B and A is a subset
>       of B, how do I disambiguate them such that A will always be tried
>       before B? The order they appear in the %tokens% section does not
>       seem to affect this and I did not see an example of this in the
>       documentation.
>
>
>
>       The parser I am trying to construct is for a template-like
>         language with commands embedded in text. Thus I have a "text"
>         token regex <<.+>> to match everything not otherwise
>         matched as a command, but I only want to match it after all
>         other token regex patterns have been tried.
>
>
>
>           Drew Vogel
>
>
>
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
>
>
>
>
>
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
>
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Grammatica-users] Regex token order, Drew Vogel, 2011/02/28
- Re: [Grammatica-users] Regex token order, Oliver Bock, 2011/02/28
  - Re: [Grammatica-users] Regex token order, Drew Vogel, 2011/02/28
    - Re: [Grammatica-users] Regex token order, Per Cederberg <=

Prev by Date: Re: [Grammatica-users] Regex token order
Previous by thread: Re: [Grammatica-users] Regex token order
Index(es):
- Date
- Thread