[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grammatica-users] Fuzzy tokenizer.
From: |
Per Cederberg |
Subject: |
Re: [Grammatica-users] Fuzzy tokenizer. |
Date: |
Fri, 01 Jul 2005 15:02:30 +0200 |
On fri, 2005-07-01 at 11:45 +0300, Matti Katila wrote:
> I'm thinking whether it would be useful to return special EOF token when
> end is reached. Then a start production could be written as:
>
> S = atoms* EOF;
Well, that is an option. Have to watch out for namespace clashes
though, as the user might have already defined a token "EOF". I
have basically dodged this whole issue as one can always work-
around this if it is sufficiently important (after all, not so
many people parse empty input in the first place).
> 48 hours later first working version is at hand. Basicly I made big
> modifications to RecursiveDescentParser and Tokenizer. Hmm, and
> sligthly modified Parser too. One new class was created to have a
> workaround for ignore tokens which was a bit troublesome problem.
Cool!
> I still don't give you the URI since the files need some cleaning and
> that's why I'm writing currently. So, how to allow grammatica to use
> different implementations, e.g., by declaring 'TOKEN_CONTEXT_SENSITIVE =
> "true"' in your grammar?
Actually, this is what the GRAMMARTYPE parameter should be used
for. So you just have to invent a good name for this types of
grammars, like "CONTEXTSENSITIVE" or some cryptical abbreviation:
GRAMMARTYPE = "CS"
> Currently generated parsers extend RecursiveDescentParser. It would be
> straighforward to change the generated parser to extend another Parser
> implementation. Other option I can see is to write and extend
> DelegateParser which would delegate all method invokations to delegated
> parser but that's a lot of work and smells like glue.
Extending RecursiveDescentParser seems reasonable to me (without
having seen the code).
> Second question, what would be the proper package for the new type of
> parser? Perhaps it would make sense to make some refactoring. For example
> like this:
> ...
> create abstract Tokenizer
I think Tokenizer would be better off as an interface.
>
> net.percederberg.grammatica.parser.ll:
> perhaps create abstract RecursiveDescentParser which only checks
> production rules.
>
> LookAheadRecursiveDescentParser.java
> LongestFindTokenizer.java
>
> new context sensitive classes:
> ContextSensitiveRecursiveDescentParser.java
> ExpectTokenizer.java
> TokenStack.java - ignore token problem workaround.
Or just have them all in the same package for now. There
aren't that many classes after all.
> Hmm, actually I don't like this since new comers find it hard to look in
> all classes in parser.ll package and that's why a better option from that
> point of view might be:
>
> net.percederberg.grammatica.parser.ll:
> create abstract RecursiveDescentParser which only checks
> production rules.
>
> net.percederberg.grammatica.parser.ll.traditional:
> LookAheadRecursiveDescentParser.java
> LongestFindTokenizer.java
>
> net.percederberg.grammatica.parser.ll.sensitive:
> new context sensitive classes:
> ContextSensitiveRecursiveDescentParser.java
> ExpectTokenizer.java
> TokenStack.java - ignore token problem workaround.
Or we just move up some more shared stuff into the Parser
class. With additional protected methods that subclasses
can call if they are interested.
> This far it has been a pleasure to read and modify your source code. It's
> very well written and follows nice programming style. What kind of
> developing tools are you using? And I didn't except that almost everything
> is documented in the source =)
Thanks! I try my best to follow a consistent code style. That's
why I'm using Eclipse with a style check plugin to keep me
reminded of all the details.
/Per