grammatica-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Grammatica-users] Fuzzy tokenizer.


From: Matti Katila
Subject: [Grammatica-users] Fuzzy tokenizer.
Date: Wed, 29 Jun 2005 22:13:33 +0300 (EEST)

Hello,

First let me introduce you a simple grammar:

%tokens%
letter = <<[a-z]>>
digit   = <<[0-9]>>
b = "b"

%productions%
S = letter ( letter | digit )*


First question: How to define a production for empty file? It seems that
there needs to be at least one character or error is thrown since
non-empty production is illegal. Also, was it documented that first
pattern is used as a startPattern?

Back to example grammar. With input "acd" all works fine, but with input
"abc" an error is thrown since "b" is not expected token. You may wonder
of course why to create such a "b" there but if we add another production
like:

bXb = "b" letter "b",

we win in readability which I have said to be my high consern already.

To make grammatica even slower I will branch current version into darcs
repository and will contribute a patch witch fixes this tokenizer design
"flaw". Current design doesn't care what tokens the current production is
looking for but tries it best to give a sequense of tokens what tokenizer
thinks as best. Darcs supports distributed developing and it is easy for
others to pick up my patches for grammatica if you are interested in those.


   -Matti




reply via email to

[Prev in Thread] Current Thread [Next in Thread]