Re: [Grammatica-users] multiline in token definition

grammatica-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grammatica-users] multiline in token definition

From:	Per Cederberg
Subject:	Re: [Grammatica-users] multiline in token definition
Date:	Thu, 3 Nov 2011 18:16:12 +0100

Understand that an LL parser like Grammatica is context-free, i.e. the tokenization is separated from the grammar rules. You can run Grammatica in tokenize-only mode to show just the stream of tokens read.

The order of the token definitions is important, and has always been such:

1. The longest matching token
2. The first defined token (if both have same length)

Hence, do like this:

- Keep tokens short (without whitespace).
- Place string tokens above regex tokens.

Cheers,

/Per

On Thursday, November 3, 2011, Steffen Gaede <address@hidden> wrote:
>
>> Hi Steffen,
>>
>>> I'm writing the grammar for an Select-from-where string.
>>> The Main Problem is, that the regExp at defining token not known about
>>> exclude words (like command words):
>>>
>>> e.g.:
>>> FUNCTION_U =
>>> <<(^((from|where|distinct|unique|or|and|not|group|order|by|having|as|asc|desc|on|like|between|in|count|min|max|avg|sum)[
>>> ]*\())[a-z0-9]+[ ]*\(>>
>>> doesn't work: It accepts then only FUNCTION_U with suffix of excluded
>>> text like "minxxxx", "havingbbb" not "murx".
>>
>> (Note: The reason it fails is that "^" works like "NOT" only inside of "[...]".)
>
> ok, that professing something.
>
>>
>>> Ok, then I decided to build a combination, that has all literal
>>> combination.
>>
>> I believe that either 1.5 or 1.6 has been modified such that the order in which tokens are given represents priority. (To be exact, the first of all longest matches will be chosen.) I.e., if you define
>>
>> KW_from = "from"
>> KW_where = "where"
>> ...
>> FUNCTION_U = <<[a-z0-9]+>>
>>
>> then each keyword would be matched by both its specific definition as well as the general one, but since the specific comes first, that will be chosen, and you are fine.
>>
>> (Also, please look at documentation and examples how whitespace should be handled, e.g., "WHITESPACE = <<[ \t\n\r]+>> %ignore%".)
>
> The Whitespace in the funtion is nessessary - because, if I've an
> _expression_ defined as:
> _expression_ = <<[a-z0-9_]+>>
> and a function with
> MIN = "min"
> and a symbol with
> LEFT_PAREN = "("
> RIGHT_PAREN = ")"
> DOT = "."
>
> and my SFW has "select min(Person.Salary) ..."
> he searches for "min(" instead of "min" "leftparen" although I've defined:
> FUNCTION1 = MIN|MAX;
> S1 = FUNCTION1 "(" _expression_ ["." _expression_] ")";
>
> (the same with "min (Person.Salary) ..." - he search for "min ("
> So it does work only with:
> MIN = <<min[ ]*\(>>
> and
> S1 = FUNCTION1 _expression_ ["." _expression_] ")";
>
>
>> Please try if the above works. If you have 1.5 and I am wrong and the new priority determination is only included in 1.6, then the old one should apply which says that literals ("...") are preferred over regexes (<<...>>), so it should work in your case, too.
> That was my first problem, before I've tried the full-word exclusion.
>
> Steffen.
>
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/grammatica-users
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Grammatica-users] multiline in token definition, Steffen Gaede, 2011/11/03
- Re: [Grammatica-users] multiline in token definition, Oliver Gramberg, 2011/11/03
  - Re: [Grammatica-users] multiline in token definition, Steffen Gaede, 2011/11/03
    - Re: [Grammatica-users] multiline in token definition, Per Cederberg <=
    - Re: [Grammatica-users] multiline in token definition, Steffen Gaede, 2011/11/04

Prev by Date: Re: [Grammatica-users] multiline in token definition
Next by Date: Re: [Grammatica-users] multiline in token definition
Previous by thread: Re: [Grammatica-users] multiline in token definition
Next by thread: Re: [Grammatica-users] multiline in token definition
Index(es):
- Date
- Thread