Re: [Grammatica-users] Trying to create a (G) LR parser with support for

grammatica-users
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grammatica-users] Trying to create a (G) LR parser with support for

From:	Per Cederberg
Subject:	Re: [Grammatica-users] Trying to create a (G) LR parser with support for ambiguities..
Date:	Mon, 24 Oct 2005 08:09:04 +0200
The slowness in preparing some Grammatica parsers is a known
issue. There was a one-liner patch that may or may not remedy
this for your grammar, but the correct solution is to change
the look-ahead data structure. Currently Grammatica is using
a simple array of arrays of token id:s, which starts to be
really slow on large look-ahead sets.

I started out on a new tree-based structure this spring but
lost the spirit a bit when I realized I had made a mistake
and would be forced to go back and redo everything again...

Regarding your additions to the Grammar format, it sounds
interesting. Maybe Grammatica should support "unparsed" or
user-defined sections before, between or after the standard
%header%, %tokens% and %productions% separators. Then your
grammar format wouldn't be incompatible with others (except
for being LR, of course).

/Per

On sun, 2005-10-23 at 22:36 -0300, Andres Hohendahl wrote:
> HI 
> 
> I have been testing...grammatica parsers under C#
> Have ported the main part of Gramática (the whole parser) successfully
> towards C# (will send the source back asap.) 
> Noted some awfulness (slowness) in its use...hmmm.
> 
> I did this:
> Created a EBNF parser using grammatica (1.4, sorry!) then used THIS parser
> to create the parser object to feed into the grammar. Ported the Grammar
> Class, and the First and SecondPass Parser's, that's it. And It works!
> 
> Then I load the grammar file (Spanish) from disk, and create a Parser of it,
> then I can create an analyzer and a tokenizer for the Spanish phrase (to be
> analyzed) and voilá! it parses ok!..but
> 
> PROBLEMS
> First, when the parser itself is created out of the BNF grammar file), it
> builds a parse-tree calculating the lookaheads.. this is a VEEEEERY slow!
> action..hmmm, and for only 18 productions (simple Spanish I've sent down in
> my last mail) this has a 5-10 second delay (P4/2.4G/512K/WinXP)!! Too slow
> to be useful. 
> I've tried to create a Full Parser (as a library itself) and.. the time is
> the same (it also creates the in-memory look-ahead structure).
> The (generated) bnf loader it very fast! so the most time is spent in
> preparing the parser.
> 
> SOLUTIONS??
> 
> TODO (in theory):
> I will (try to) write a simple LR parser (shift-reduce) and then modify the
> tokenizer to allow ambiguities to be handled, and then modify the node class
> to allow "lateral" nodes to build "parse-forest's" (multiple trees) and then
> may be include a (simil-context) constrain as a separate part of the syntax
> (aka grammar) like %constrain% inside I want to put simple rules to allow to
> disambiguate the grammar in run time.
> Those rules should be applied at the reduce phase, and will (should) quickly
> kill inadequate ambiguities in the production-parser forest (on the fly)
> (hope so) this will be a new "strange" kind of parser I guess, hope it
> works...
> 
> This new Tokenizer will parse input strings with normal parsing methods
> (regex, etc.), then give them into some external procedures (using
> callback/events) to "tag" them externally (using my affix/morpho-syntactical
> / dictionary libraries) and feed tem back along with the newly added
> multiple tags (which will be part of the %constrain-tag% tags)
> 
> My proposal on this new "layer" is something like:
> --------------------------------
> &header% bla blah..
> %token%
> noun = <<\w+\.nn>>
> article = <<\w+\.ar>>
> verb = = <<\w+\.ve>>
> adverb = <<\w+\.av>>
> pronoun = <<\w+\.pr>>
> preposition = <<\w+\.pp>>
> 
> %production%
> sentence = [subject] predicate;
> subject = [article] noun [adjective];
> predicate = verb [adverb]* [complement]*;
> complement = preposition ( noun | adverb | sentence );
> 
> %constrain-tag%
> gender = M|F|N
> number = singular | plural | dual
> person = first | second | third | impersonal
> 
> %constrains%
> subject:  article(gender,number) = noun(gender,number) =
> adjective(gender,number);
> sentence: subject(person) = verb(person)
> 
> <*---------------------------------------------------*>
> Q: is this a new stuff, or am I reinventing the wheel?
> <*---------------------------------------------------*>
> 
> Any comments are welcome ;)
> 
> Andrés Hohendahl
> 
> -----Mensaje original-----
> De: address@hidden
> [mailto:address@hidden En
> nombre de Per Cederberg
> Enviado el: Thursday, October 20, 2005 2:54 PM
> Para: address@hidden
> Asunto: RE: [Grammatica-users] Lookahead feature?
> 
> Hi Andres,
> 
> Before you rush ahead with your project, I suggest that you take
> a look at the current development version of Grammatica:
> 
> http://grammatica.percederberg.net/download/development/index.html
> 
> The thing is, that Grammatica has already been ported to .NET
> once by Adrian Moore. I have so far only included parts of his
> efforts, due to my general slowness. The code generator parts
> for .NET are already ported, but have been launched in a
> separate project -- SourceFlow (www.sourceflow.org).
> 
> So currently only the Grammatica application remains to be
> ported. If you can contribute to that, I'm all ears. Also, if
> you have any other improvements I'd appreciate patches relative
> to the latest development snapshot, as it is otherwise very slow
> and tedious for me to merge the changes. Also, I try to maintain
> the same coding style that is currently used in all C# parts of
> Grammatica (regarding comments etc).
> 
> Regarding a conversion tool between different grammar formats,
> that is an excellent idea! Hadn't thought about that before,
> but having such a tool would definitely rock!
> 
> Cheers,
> 
> /Per
> 
> On thu, 2005-10-20 at 13:52 -0300, Andres Hohendahl wrote:
> > Hi there!
> > 
> > I am now working on a modification of the Grammatica Parser-Generator
> under
> > C# to allow this:
> > Get (read) a Full EBNF grammar file (on the fly) then parse it (with a
> > EBNF-grammatica generated parsed) and generate a Parse Tree.
> > 
> > NOTE:
> > I've successfully ported the Grammatica-Core Parse's towards C# (.net)
> > Also I modified the Run-time library to be more C# friendly while
> > maintaining full compatibility towards the grammatical generated code. I
> > think I've improved readability and may be some performance.
> > (Per: do you want me to send in the source's)?
> > 
> > ...
> > Then take this Parser, and scan an input text (implementing this grammar)
> > and generate an interpreted representation of this input fed from the
> > grammar-parser tree into a new hierarchical tree to represent the
> > "parsed/compiled/acquired" data from the input.
> > 
> > This idea came from the fact that many tokens/words (in natural languages)
> > has multiple possible functions, and thus generate ambiguous grammar's.
> > So this is the point: let's generate a tree of the input tokens, with all
> > the (possibilities) as branches.
> > Then generate a (recursive) method of: 
> > 1 Taking all possible token-sequence's and feeding them into the parser's
> > tree, to see if they succeed. After each success, the system will generate
> a
> > "possible representation list of token's" along with a "ponderation" in
> > which each word who has several "functions" has a value associated to the
> > each function (creating a most-used hierarchy). Then all "values are
> added,
> > and the final result will be a ordered (probabilistically) list of the
> > possible input token's values (functions) thet responds to thei
> > deterministic grammar.
> > 
> > Thoughts are welcome
> > 
> > Andres
> > ----------------
> > By the Way, Per:
> > 
> > Is there any grammar translation tool to-from EBNF(grammatica's) <->
> > EBNF(std) / YACC / BRISON / etc. (standard) 
> > Because there are a lot of tools out there (IE. The Gold-Parsed) to
> > debug/test your grammars graphically... but they don’t use this EBNF
> syntax,
> > and the conversion must be made by hand (quite tedious)!
> > 
> > url-Links are welcome..too!
> > 
> >
> ---------------------------------------------------------------------------
> > Here is a grammar file in which I have been wonkin in, and the "words"
> will
> > be fed into pre-tagged as follows, for the phrase "this word is red"
> > 
> > This.AR word.SN is.VB red.AD
> > 
> > I have some problems in representing this ambiguous (Spanish) grammar, can
> > anyone help me out?
> > 
> > HERE IT IS:
> > 
> > -------------------------------
> > 
> > /*
> >  * Gramática Sintaxis Española 
> >  */
> > %header%
> > GRAMMARTYPE = "LL"
> > DESCRIPTION = "Spanish grammar for SINTACTIC-spanish recognition"
> > AUTHOR      = "AndySoft"
> > VERSION     = "1.0"
> > DATE        = "19 Oct 2005"
> > LICENSE     = ".."
> > COPYRIGHT   = "All rights.. hmmm.. reserved.?"
> > 
> > %tokens%
> > 
> > LEFT_BRACKET = "["
> > RIGHT_BRACKET = "]"
> > WHITESPACE = <<[ \t\n\r]+>> %ignore%
> > 
> > // POS (Part Of Speech )
> > //  Auxiliary Verb (haber, ser, estar) ' Copulative ' no Direct/Indirect
> > Object
> > VX =  <<\w+\.VX\w*>>
> > //  Verb Transitive (need direct object)
> > VT =  <<\w+\.VT\w*>>
> > //  Verb Intransitive (Reflexive) don't need direct object
> > VI =  <<\w+\.VI\w*>>
> > //  Verb (Transitive or Intransitive)
> > VB =  <<\w+\.VB\w*>>
> > //  Verb Participio
> > VP =  <<\w+\.VP\w*>>
> > //  Verb Gerund
> > VG =  <<\w+\.VG\w*>>
> > //  Verb Suplemento
> > VS =  <<\w+\.VS\w*>>
> > //  Adverb
> > AV =  <<\w+\.AV\w*>>
> > //  Adverbio de Cantidad
> > AC = <<\w+\.AC\w*>>
> > //  Adjective
> > AJ =  <<\w+\.AJ\w*>>
> > //  Noun singular or Mass
> > NN =  <<\w+\.NN\w*>>
> > //  Proper Noun, singular
> > SN =  <<\w+\.SN\w*>>
> > //  Noun, Numeral
> > NU = <<\w+\.NU\w*>>
> > //  Article
> > AR = <<\w+\.AR\w*>>
> > //  Personal Pronoun
> > PP = <<\w+\.PP\w*>>
> > //  Possessive Pronoun
> > PO = <<\w+\.PO\w*>>
> > //  Demonstrative Pronoun
> > PD = <<\w+\.PD\w*>>
> > //  Interrogative Pronoun
> > PI = <<\w+\.PI\w*>>
> > //  Preposition
> > PR = <<\w+\.PR\w*>>
> > //  Conjunction
> > CJ = <<\w+\.CJ\w*>>
> > //  Interjection
> > IJ = <<\w+\.IJ\w*>>
> > //  Pronominal
> > PL = <<\w+\.PL\w*>>
> > //  Un-Known
> > UK = <<\w+\.UK\w*>>
> > // Special reflexive particle "se"
> > SE = <<\w+\.SE>>
> > 
> > // SINTAGMAS (construcciones)
> > %productions%
> > 
> > // Frase
> > // Fras     = Orac | Orel | SNom ;
> > // Oración
> > Orac        = [ SNom ]  SVer ;
> > // Oracíon de Relativo
> > Orel        =  Orac ;
> > // Oración Interrogativa
> > // Oint     = Orac ;
> > 
> > // Sintagma Nominal
> > SNom        = [ Det ]  SNuc ;
> > // Nucleo Nominal
> > SNuc        =  ( NN | SN | NU )  [ CNuc ] ;
> > //  un nombre común 
> > //  un nombre propio (y entonces no aparece ningún determinante en Det) 
> > //  un pronombre personal (y entonces no aparece ningún determinante en
> > Det) 
> > //  cualquier elemento sustantivado (infinitivos aislados, adjetivos,
> > oraciones subordinadas //   sustantivas, adverbios...), que generalmente
> > irá precedido por un artículo.
> > // Determinante
> > Det =   AR ;
> > //  artículo 
> > //  demostrativo 
> > //  indefinido 
> > //  numeral 
> > //  posesivo 
> > //  distributivo 
> > //  interrogativo 
> > //  exclamativo
> > //  Complemento del Núcleo
> > CNuc        =  (  SAdj | SPre |  SNom |  Orel ) [ CNuc ] ;
> > //  un sintagma adjetival 
> > //  un sintagma preposicional 
> > //  un SN en aposición 
> > //  una oración de relativo
> > 
> > 
> > // Sintagma Adjetival
> > SAdj        = [ AC ] NAdj [ CAdj ] ;
> > // Núcleo Adjetival
> > NAdj        =   AJ  ;
> > // Complemento del Adjetivo
> > CAdj        =  SPre ;
> > 
> > 
> > // Sintagma Verbal
> > SVer        =  ( VTr CDir ) | ( VI CInd ) | ( VS CCir ) ;
> > VTr = VT | VT VI ;
> > // Si el verbo es transitivo: un SNom, un SPre o una oración en función de
> > Complemento Directo. 
> > CDir        = SNom | SPre | Orac ;
> > // Si el verbo es copulativo: cualquier sintagma (salvo SVer) o una
> oración
> > en función de Atributo.
> > CInd        = ( SNom | SAdv | SPre | Orac ) [ CInd ] ;
> > // Si el verbo es de suplemento: un SP (que contenga un SNom o una
> oración)
> > en función de Suplemento 
> > CCir        = [ PR ]  ( SNom | Orac ) ;
> > // Verbo Compuesto
> > VComp = [ VI ] VT [VI] ;
> > 
> > // Sintagma Adverbial
> > SAdv        = [ AC ] NAdv ;
> > 
> > // Núcleo (Adverbio/forma Adverbial)
> > NAdv        =  AV  [ CAdv ] ;
> > // Complemento del Adverbio
> > CAdv        = SPre ;
> > 
> > // Sintagma Preposicional  
> > SPre        = [ PR ]  ( SNom | SAdv |  Orac | SPre ) ;
> > 
> > 
> > 
> > _______________________________________________
> > Grammatica-users mailing list
> > address@hidden
> > http://lists.nongnu.org/mailman/listinfo/grammatica-users
> > 
> 
> 
> 
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
> 
> 
> 
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
[Prev in Thread]
Current Thread
[Next in Thread]
RE: [Grammatica-users] Lookahead feature?, Andres Hohendahl, 2005/10/20
- RE: [Grammatica-users] Lookahead feature?, Per Cederberg, 2005/10/20
  - RE: [Grammatica-users] Lookahead feature?, Andres Hohendahl, 2005/10/20
  - [Grammatica-users] Trying to create a (G) LR parser with support for ambiguities.., Andres Hohendahl, 2005/10/23
    - Re: [Grammatica-users] Trying to create a (G) LR parser with support for ambiguities.., Per Cederberg <=
- RE: [Grammatica-users] Lookahead feature?, Matti Katila, 2005/10/21
Prev by Date: [Grammatica-users] Trying to create a (G) LR parser with support for ambiguities..
Previous by thread: [Grammatica-users] Trying to create a (G) LR parser with support for ambiguities..
Next by thread: RE: [Grammatica-users] Lookahead feature?
Index(es):
- Date
- Thread