chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] backtracking in abnf/lexgen


From: Nathaniel Rudavsky-Brody
Subject: Re: [Chicken-users] backtracking in abnf/lexgen
Date: Mon, 18 Sep 2017 13:32:21 +0200

Hi Ivan,

Thanks a lot for your explanation. It helped me find the key I'd been missing: "The SPARQL grammar is LL(1) when the rules with uppercased names are used as terminals." So the uppercased names *do* need some kind of backtracking.

My example with the parentheses was a bit silly (indeed I'd misread the grammar slightly), but I was trying to illustrate the basic problem which still holds: PN_PREFIX and PN_LOCAL can't end with ".":

PN_PREFIX  ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?

PN_LOCAL ::=  (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' | PLX)* (PN_CHARS | ':' | PLX) )?

(from https://www.w3.org/TR/sparql11-query/#sparqlGrammar)

As I mentioned, I was able to write a simple backtracking star to parse these, but it didn't compose well with bar, since bar doesn't send back the error which is needed to trigger the backtracking. The best I've been able to get working is a "sandbox" combinator (code below) that limits the scope of the backtracking, so I don't have to modify anything else in abnf/lexgen or my parser. Does it look correct to you?

I hope that since these are the only two places in the grammar that require backtracking, and I'm able to isolate it, that the performance hit won't be to big...?

Nathaniel






(define (bstar p)
  (lambda (sk fk strm)
    (let ((try
           (lambda (s)
             (let ((ss (sk s)))             
               (if (equal? ss '(error)) #f
                   ss)))))
      (p (lambda (strm1)
           (or
            ((bstar p) try try strm1)
            (sk strm)))
         sk
         strm))))

(define (sandbox p)
  (lambda (sk fk strm)
    (let ((s (p values err strm)))
      (if (equal? s '(error))
          (fk strm)
          (sk s)))))

(define PN_PREFIX
  (concatenation
   PN_CHARS_BASE
   (optional-sequence
    (sandbox
     (concatenation
      (bstar
       (alternatives
        (char-list/lit ".")
        PN_CHARS))
      PN_CHARS)))))




On Fri, Sep 15, 2017 at 5:42 PM, Ivan Raikov <address@hidden> wrote:
Hi Nathaniel,

    No problem, thanks for trying to use lexgen and abnf. The excerpt
from the grammar does not include the rules for PNAME, or PNAME_NS,
but if I understand correctly, you are suggesting that PrefixedName
could include parentheses which could cause the closing parenthesis of
DataBlockValue to be consumed incorrectly. This would indeed require
general backtracking, but it is a bit surprising to me, because
usually ABNF grammars avoid backtracking as it can be very costly in
terms of memory and computation time. For detailed explanation see
e.g. here: http://www.artima.com/pins1ed/combinator-parsing.html#31.10

   Parser combinator libraries such as lexgen tend to favor LL
grammars and avoid backtracking because the latter requires keeping
multiple copies of the input stream, and traversing back one token at
a time until a rule succeeds, which usually ends up being
prohibitively slow. I suppose a naive backtracking combinator in
lexgen  could be implemented using something analogous to the
bind/rebind combinators, but instead the argument combinator would be
applied to the previous copy of the input stream in case of failure.
But I suspect this would be too slow to be of any practical use.

  Are you absolutely certain that the parentheses in PrefixedName can
occur as something other than in escaped characters? Again, requiring
left recursion and general backtracking is very unusual for an ABNF
grammar, so better to make sure you understand the various PN_ rules.
I am happy to help with that, just let me know.

-Ivan


On Fri, Sep 15, 2017 at 1:45 AM, Nathaniel Rudavsky-Brody
<address@hidden> wrote:
> Hi Ivan,
>
> Thanks for replying, and sorry for being unclear. By different levels I just
> mean that I need a non-greedy * for a low-level rule that gets combined in
> different ways higher up. I'm able to write a correct parser by "flattening
> out" the rules, but since it's a big grammar (173 rules) that quickly gets
> messy.
>
> Here's an excerpt from the real rules. Practically speaking, the problem is
> that by itself, "ex:abc)" is a valid PrefixedName, but in the _expression_
> "(?var) { (ex:abc) }" only "ex:abc" should be parsed as a PrefixedName, and
> the ")" as part of the InlineDataFull.
>
>
> PN_LOCAL_ESC  ::=  '\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '('
> | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%' )
>
> PLX  ::=  PERCENT | PN_LOCAL_ESC
>
> PN_LOCAL ::=  (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' |
> PLX)* (PN_CHARS | ':' | PLX) )?
>
> PNAME_LN  ::=  PNAME
>
> PrefixedName  ::=  PNAME_LN | PNAME_NS
>
> iri  ::=  IRIREF | PrefixedName
>
> DataBlockValue  ::=  iri | RDFLiteral | NumericLiteral | BooleanLiteral |
> 'UNDEF'
>
> InlineDataFull  ::=  ( NIL | '(' Var* ')' ) '{' ( '(' DataBlockValue* ')' |
> NIL )* '}'
>
>
>
> With my limited understanding of lexgen's internals, I can write a * that
> backtracks if it catches an error from a later success continuation, but
> this doesn't compose with something like opt that doesn't signal an error:
>
> (define (bstar p)
>   (lambda (sk fk strm)
>     (let ((try
>            (lambda (s)
>              (let ((ss (sk s)))
>                (if (equal? ss '(error)) #f
>                    ss)))))
>       (p (lambda (strm1)
>            (or
>             ((bstar p) try sk strm1)
>             (sk strm)))
>          sk
>          strm))))
>
>
> (lex (seq (bstar (lit "a")) (lit "a")) err "aaaaab")) ; => ((a a a a a) (b))
>
> but
>
> (lex (opt (seq (bstar (lit a)) (lit a))) err "aaaaab") ; =>  (() (a a a a a
> b))
>
>
> Does this make more sense?
>
> Nathaniel
>
>
> On Thu, Sep 14, 2017 at 9:54 PM, Ivan Raikov <address@hidden>
> wrote:
>>
>> Hi Nathaniel,
>>
>>     I don't understand what you mean by "different levels" here. Can
>> you express your grammar in ABNF notation?
>>
>>    -Ivan
>>
>>
>> On Thu, Sep 14, 2017 at 4:06 AM, Nathaniel Rudavsky-Brody
>> <address@hidden> wrote:
>> > Hi,
>> >
>> > I was wondering if anyone might have some advice for a beginner's
>> > question
>> > with abnf/lexgen. Clearly it has something to do with backtracking, and
>> > I
>> > hope I'm just missing something simple -- 1500 lines into my parser, I'd
>> > hate to have to start from scratch with another approach!
>> >
>> >
>> > For matching a string of a's and b's that ends with an "a", clearly
>> >
>> >
>> > (lex (seq (star (bar (lit "a") (lit "b")))
>> >                (lit "a"))
>> >     error "ababa")  ; => error
>> >
>> >
>> > For patterns on the same level, I can do manual backtracking:
>> >
>> >
>> > (define-syntax vac
>> >   (syntax-rules ()
>> >     ((_ fn) (lambda args (apply fn args)))))
>> >
>> > (define (left-star-seq p1 p2)   ; p1*p2
>> >   (vac
>> >    (bar
>> >     (seq p1 (left-star-seq p1 p2))
>> >     p2)))
>> >
>> > (lex (left-star-seq
>> >           (bar (lit "a") (lit "b"))
>> >           (lit "a"))
>> >   error "ababa")   ; =>  ((#\a #\b #\a #\b #\a) ())
>> >
>> >
>> > but what if p1 and p2 are on completely different levels, something
>> > like:
>> >
>> >
>> > (define p1 (star (bar (lit "a") (lit "b"))))
>> > (define p2 (lit "a"))
>> > ...
>> > (define q2 (seq q1 p1))
>> > (define q4 (seq q3 q2))
>> > (define Word (seq q4 p2))
>> >
>> >
>> > Is there way to do "composable" backtracking with these tools? (and is
>> > the
>> > question even the right one?)
>> >
>> > Thanks for any ideas,
>> >
>> > Nathaniel
>> >
>> > _______________________________________________
>> > Chicken-users mailing list
>> > address@hidden
>> > https://lists.nongnu.org/mailman/listinfo/chicken-users
>> >
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]