[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grammatica-users] Trying to create a (G) LR parser with support for
From: |
Per Cederberg |
Subject: |
Re: [Grammatica-users] Trying to create a (G) LR parser with support for ambiguities.. |
Date: |
Mon, 24 Oct 2005 08:09:04 +0200 |
The slowness in preparing some Grammatica parsers is a known
issue. There was a one-liner patch that may or may not remedy
this for your grammar, but the correct solution is to change
the look-ahead data structure. Currently Grammatica is using
a simple array of arrays of token id:s, which starts to be
really slow on large look-ahead sets.
I started out on a new tree-based structure this spring but
lost the spirit a bit when I realized I had made a mistake
and would be forced to go back and redo everything again...
Regarding your additions to the Grammar format, it sounds
interesting. Maybe Grammatica should support "unparsed" or
user-defined sections before, between or after the standard
%header%, %tokens% and %productions% separators. Then your
grammar format wouldn't be incompatible with others (except
for being LR, of course).
/Per
On sun, 2005-10-23 at 22:36 -0300, Andres Hohendahl wrote:
> HI
>
> I have been testing...grammatica parsers under C#
> Have ported the main part of Gramática (the whole parser) successfully
> towards C# (will send the source back asap.)
> Noted some awfulness (slowness) in its use...hmmm.
>
> I did this:
> Created a EBNF parser using grammatica (1.4, sorry!) then used THIS parser
> to create the parser object to feed into the grammar. Ported the Grammar
> Class, and the First and SecondPass Parser's, that's it. And It works!
>
> Then I load the grammar file (Spanish) from disk, and create a Parser of it,
> then I can create an analyzer and a tokenizer for the Spanish phrase (to be
> analyzed) and voilá! it parses ok!..but
>
> PROBLEMS
> First, when the parser itself is created out of the BNF grammar file), it
> builds a parse-tree calculating the lookaheads.. this is a VEEEEERY slow!
> action..hmmm, and for only 18 productions (simple Spanish I've sent down in
> my last mail) this has a 5-10 second delay (P4/2.4G/512K/WinXP)!! Too slow
> to be useful.
> I've tried to create a Full Parser (as a library itself) and.. the time is
> the same (it also creates the in-memory look-ahead structure).
> The (generated) bnf loader it very fast! so the most time is spent in
> preparing the parser.
>
> SOLUTIONS??
>
> TODO (in theory):
> I will (try to) write a simple LR parser (shift-reduce) and then modify the
> tokenizer to allow ambiguities to be handled, and then modify the node class
> to allow "lateral" nodes to build "parse-forest's" (multiple trees) and then
> may be include a (simil-context) constrain as a separate part of the syntax
> (aka grammar) like %constrain% inside I want to put simple rules to allow to
> disambiguate the grammar in run time.
> Those rules should be applied at the reduce phase, and will (should) quickly
> kill inadequate ambiguities in the production-parser forest (on the fly)
> (hope so) this will be a new "strange" kind of parser I guess, hope it
> works...
>
> This new Tokenizer will parse input strings with normal parsing methods
> (regex, etc.), then give them into some external procedures (using
> callback/events) to "tag" them externally (using my affix/morpho-syntactical
> / dictionary libraries) and feed tem back along with the newly added
> multiple tags (which will be part of the %constrain-tag% tags)
>
> My proposal on this new "layer" is something like:
> --------------------------------
> &header% bla blah..
> %token%
> noun = <<\w+\.nn>>
> article = <<\w+\.ar>>
> verb = = <<\w+\.ve>>
> adverb = <<\w+\.av>>
> pronoun = <<\w+\.pr>>
> preposition = <<\w+\.pp>>
>
> %production%
> sentence = [subject] predicate;
> subject = [article] noun [adjective];
> predicate = verb [adverb]* [complement]*;
> complement = preposition ( noun | adverb | sentence );
>
> %constrain-tag%
> gender = M|F|N
> number = singular | plural | dual
> person = first | second | third | impersonal
>
> %constrains%
> subject: article(gender,number) = noun(gender,number) =
> adjective(gender,number);
> sentence: subject(person) = verb(person)
>
> <*---------------------------------------------------*>
> Q: is this a new stuff, or am I reinventing the wheel?
> <*---------------------------------------------------*>
>
> Any comments are welcome ;)
>
> Andrés Hohendahl
>
> -----Mensaje original-----
> De: address@hidden
> [mailto:address@hidden En
> nombre de Per Cederberg
> Enviado el: Thursday, October 20, 2005 2:54 PM
> Para: address@hidden
> Asunto: RE: [Grammatica-users] Lookahead feature?
>
> Hi Andres,
>
> Before you rush ahead with your project, I suggest that you take
> a look at the current development version of Grammatica:
>
> http://grammatica.percederberg.net/download/development/index.html
>
> The thing is, that Grammatica has already been ported to .NET
> once by Adrian Moore. I have so far only included parts of his
> efforts, due to my general slowness. The code generator parts
> for .NET are already ported, but have been launched in a
> separate project -- SourceFlow (www.sourceflow.org).
>
> So currently only the Grammatica application remains to be
> ported. If you can contribute to that, I'm all ears. Also, if
> you have any other improvements I'd appreciate patches relative
> to the latest development snapshot, as it is otherwise very slow
> and tedious for me to merge the changes. Also, I try to maintain
> the same coding style that is currently used in all C# parts of
> Grammatica (regarding comments etc).
>
> Regarding a conversion tool between different grammar formats,
> that is an excellent idea! Hadn't thought about that before,
> but having such a tool would definitely rock!
>
> Cheers,
>
> /Per
>
> On thu, 2005-10-20 at 13:52 -0300, Andres Hohendahl wrote:
> > Hi there!
> >
> > I am now working on a modification of the Grammatica Parser-Generator
> under
> > C# to allow this:
> > Get (read) a Full EBNF grammar file (on the fly) then parse it (with a
> > EBNF-grammatica generated parsed) and generate a Parse Tree.
> >
> > NOTE:
> > I've successfully ported the Grammatica-Core Parse's towards C# (.net)
> > Also I modified the Run-time library to be more C# friendly while
> > maintaining full compatibility towards the grammatical generated code. I
> > think I've improved readability and may be some performance.
> > (Per: do you want me to send in the source's)?
> >
> > ...
> > Then take this Parser, and scan an input text (implementing this grammar)
> > and generate an interpreted representation of this input fed from the
> > grammar-parser tree into a new hierarchical tree to represent the
> > "parsed/compiled/acquired" data from the input.
> >
> > This idea came from the fact that many tokens/words (in natural languages)
> > has multiple possible functions, and thus generate ambiguous grammar's.
> > So this is the point: let's generate a tree of the input tokens, with all
> > the (possibilities) as branches.
> > Then generate a (recursive) method of:
> > 1 Taking all possible token-sequence's and feeding them into the parser's
> > tree, to see if they succeed. After each success, the system will generate
> a
> > "possible representation list of token's" along with a "ponderation" in
> > which each word who has several "functions" has a value associated to the
> > each function (creating a most-used hierarchy). Then all "values are
> added,
> > and the final result will be a ordered (probabilistically) list of the
> > possible input token's values (functions) thet responds to thei
> > deterministic grammar.
> >
> > Thoughts are welcome
> >
> > Andres
> > ----------------
> > By the Way, Per:
> >
> > Is there any grammar translation tool to-from EBNF(grammatica's) <->
> > EBNF(std) / YACC / BRISON / etc. (standard)
> > Because there are a lot of tools out there (IE. The Gold-Parsed) to
> > debug/test your grammars graphically... but they don’t use this EBNF
> syntax,
> > and the conversion must be made by hand (quite tedious)!
> >
> > url-Links are welcome..too!
> >
> >
> ---------------------------------------------------------------------------
> > Here is a grammar file in which I have been wonkin in, and the "words"
> will
> > be fed into pre-tagged as follows, for the phrase "this word is red"
> >
> > This.AR word.SN is.VB red.AD
> >
> > I have some problems in representing this ambiguous (Spanish) grammar, can
> > anyone help me out?
> >
> > HERE IT IS:
> >
> > -------------------------------
> >
> > /*
> > * Gramática Sintaxis Española
> > */
> > %header%
> > GRAMMARTYPE = "LL"
> > DESCRIPTION = "Spanish grammar for SINTACTIC-spanish recognition"
> > AUTHOR = "AndySoft"
> > VERSION = "1.0"
> > DATE = "19 Oct 2005"
> > LICENSE = ".."
> > COPYRIGHT = "All rights.. hmmm.. reserved.?"
> >
> > %tokens%
> >
> > LEFT_BRACKET = "["
> > RIGHT_BRACKET = "]"
> > WHITESPACE = <<[ \t\n\r]+>> %ignore%
> >
> > // POS (Part Of Speech )
> > // Auxiliary Verb (haber, ser, estar) ' Copulative ' no Direct/Indirect
> > Object
> > VX = <<\w+\.VX\w*>>
> > // Verb Transitive (need direct object)
> > VT = <<\w+\.VT\w*>>
> > // Verb Intransitive (Reflexive) don't need direct object
> > VI = <<\w+\.VI\w*>>
> > // Verb (Transitive or Intransitive)
> > VB = <<\w+\.VB\w*>>
> > // Verb Participio
> > VP = <<\w+\.VP\w*>>
> > // Verb Gerund
> > VG = <<\w+\.VG\w*>>
> > // Verb Suplemento
> > VS = <<\w+\.VS\w*>>
> > // Adverb
> > AV = <<\w+\.AV\w*>>
> > // Adverbio de Cantidad
> > AC = <<\w+\.AC\w*>>
> > // Adjective
> > AJ = <<\w+\.AJ\w*>>
> > // Noun singular or Mass
> > NN = <<\w+\.NN\w*>>
> > // Proper Noun, singular
> > SN = <<\w+\.SN\w*>>
> > // Noun, Numeral
> > NU = <<\w+\.NU\w*>>
> > // Article
> > AR = <<\w+\.AR\w*>>
> > // Personal Pronoun
> > PP = <<\w+\.PP\w*>>
> > // Possessive Pronoun
> > PO = <<\w+\.PO\w*>>
> > // Demonstrative Pronoun
> > PD = <<\w+\.PD\w*>>
> > // Interrogative Pronoun
> > PI = <<\w+\.PI\w*>>
> > // Preposition
> > PR = <<\w+\.PR\w*>>
> > // Conjunction
> > CJ = <<\w+\.CJ\w*>>
> > // Interjection
> > IJ = <<\w+\.IJ\w*>>
> > // Pronominal
> > PL = <<\w+\.PL\w*>>
> > // Un-Known
> > UK = <<\w+\.UK\w*>>
> > // Special reflexive particle "se"
> > SE = <<\w+\.SE>>
> >
> > // SINTAGMAS (construcciones)
> > %productions%
> >
> > // Frase
> > // Fras = Orac | Orel | SNom ;
> > // Oración
> > Orac = [ SNom ] SVer ;
> > // Oracíon de Relativo
> > Orel = Orac ;
> > // Oración Interrogativa
> > // Oint = Orac ;
> >
> > // Sintagma Nominal
> > SNom = [ Det ] SNuc ;
> > // Nucleo Nominal
> > SNuc = ( NN | SN | NU ) [ CNuc ] ;
> > // un nombre común
> > // un nombre propio (y entonces no aparece ningún determinante en Det)
> > // un pronombre personal (y entonces no aparece ningún determinante en
> > Det)
> > // cualquier elemento sustantivado (infinitivos aislados, adjetivos,
> > oraciones subordinadas // sustantivas, adverbios...), que generalmente
> > irá precedido por un artículo.
> > // Determinante
> > Det = AR ;
> > // artículo
> > // demostrativo
> > // indefinido
> > // numeral
> > // posesivo
> > // distributivo
> > // interrogativo
> > // exclamativo
> > // Complemento del Núcleo
> > CNuc = ( SAdj | SPre | SNom | Orel ) [ CNuc ] ;
> > // un sintagma adjetival
> > // un sintagma preposicional
> > // un SN en aposición
> > // una oración de relativo
> >
> >
> > // Sintagma Adjetival
> > SAdj = [ AC ] NAdj [ CAdj ] ;
> > // Núcleo Adjetival
> > NAdj = AJ ;
> > // Complemento del Adjetivo
> > CAdj = SPre ;
> >
> >
> > // Sintagma Verbal
> > SVer = ( VTr CDir ) | ( VI CInd ) | ( VS CCir ) ;
> > VTr = VT | VT VI ;
> > // Si el verbo es transitivo: un SNom, un SPre o una oración en función de
> > Complemento Directo.
> > CDir = SNom | SPre | Orac ;
> > // Si el verbo es copulativo: cualquier sintagma (salvo SVer) o una
> oración
> > en función de Atributo.
> > CInd = ( SNom | SAdv | SPre | Orac ) [ CInd ] ;
> > // Si el verbo es de suplemento: un SP (que contenga un SNom o una
> oración)
> > en función de Suplemento
> > CCir = [ PR ] ( SNom | Orac ) ;
> > // Verbo Compuesto
> > VComp = [ VI ] VT [VI] ;
> >
> > // Sintagma Adverbial
> > SAdv = [ AC ] NAdv ;
> >
> > // Núcleo (Adverbio/forma Adverbial)
> > NAdv = AV [ CAdv ] ;
> > // Complemento del Adverbio
> > CAdv = SPre ;
> >
> > // Sintagma Preposicional
> > SPre = [ PR ] ( SNom | SAdv | Orac | SPre ) ;
> >
> >
> >
> > _______________________________________________
> > Grammatica-users mailing list
> > address@hidden
> > http://lists.nongnu.org/mailman/listinfo/grammatica-users
> >
>
>
>
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
>
>
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>