chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] backtracking in abnf/lexgen


From: Ivan Raikov
Subject: Re: [Chicken-users] backtracking in abnf/lexgen
Date: Mon, 18 Sep 2017 14:06:51 -0700

Hi Nathaniel,

It seems to me that the rules PN_PREFIX and PN_LOCAL already exclude
strings ending with a '.'; e.g. in the case of PN_PREFIX, it must end
with PN_CHARS_BASE or PN_CHARS. I think that the remark about LL1
means that the various PN rules are just regular expressions. So in
the case of PN_PREFIX I would think that this would work:

(define PN_PREFIX
(concatenation
 PN_CHARS_BASE
 (optional-sequence
  (concatenation
   (repetition
    (alternatives PN_CHARS (char-list/list ".")))
   PN_CHARS))
 ))

In other words, a PN_PREFIX either consists of PN_CHARS_BASE, or
PN_CHARSE_BASE followed by an arbitrary combination of PN_CHARS and
'.', followed by PN_CHARS.
Is there a counterexample to this?

On Mon, Sep 18, 2017 at 4:32 AM, Nathaniel Rudavsky-Brody
<address@hidden> wrote:
> Hi Ivan,
>
> Thanks a lot for your explanation. It helped me find the key I'd been
> missing: "The SPARQL grammar is LL(1) when the rules with uppercased names
> are used as terminals." So the uppercased names *do* need some kind of
> backtracking.
>
> My example with the parentheses was a bit silly (indeed I'd misread the
> grammar slightly), but I was trying to illustrate the basic problem which
> still holds: PN_PREFIX and PN_LOCAL can't end with ".":
>
> PN_PREFIX  ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?
>
> PN_LOCAL ::=  (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' |
> PLX)* (PN_CHARS | ':' | PLX) )?
>
> (from https://www.w3.org/TR/sparql11-query/#sparqlGrammar)
>
> As I mentioned, I was able to write a simple backtracking star to parse
> these, but it didn't compose well with bar, since bar doesn't send back the
> error which is needed to trigger the backtracking. The best I've been able
> to get working is a "sandbox" combinator (code below) that limits the scope
> of the backtracking, so I don't have to modify anything else in abnf/lexgen
> or my parser. Does it look correct to you?
>
> I hope that since these are the only two places in the grammar that require
> backtracking, and I'm able to isolate it, that the performance hit won't be
> to big...?
>
> Nathaniel
>
>
>
>
>
>
> (define (bstar p)
>   (lambda (sk fk strm)
>     (let ((try
>            (lambda (s)
>              (let ((ss (sk s)))
>                (if (equal? ss '(error)) #f
>                    ss)))))
>       (p (lambda (strm1)
>            (or
>             ((bstar p) try try strm1)
>             (sk strm)))
>          sk
>          strm))))
>
> (define (sandbox p)
>   (lambda (sk fk strm)
>     (let ((s (p values err strm)))
>       (if (equal? s '(error))
>           (fk strm)
>           (sk s)))))
>
> (define PN_PREFIX
>   (concatenation
>    PN_CHARS_BASE
>    (optional-sequence
>     (sandbox
>      (concatenation
>       (bstar
>        (alternatives
>         (char-list/lit ".")
>         PN_CHARS))
>       PN_CHARS)))))
>
>
>
>
> On Fri, Sep 15, 2017 at 5:42 PM, Ivan Raikov <address@hidden>
> wrote:
>>
>> Hi Nathaniel,
>>
>>     No problem, thanks for trying to use lexgen and abnf. The excerpt
>> from the grammar does not include the rules for PNAME, or PNAME_NS,
>> but if I understand correctly, you are suggesting that PrefixedName
>> could include parentheses which could cause the closing parenthesis of
>> DataBlockValue to be consumed incorrectly. This would indeed require
>> general backtracking, but it is a bit surprising to me, because
>> usually ABNF grammars avoid backtracking as it can be very costly in
>> terms of memory and computation time. For detailed explanation see
>> e.g. here: http://www.artima.com/pins1ed/combinator-parsing.html#31.10
>>
>>    Parser combinator libraries such as lexgen tend to favor LL
>> grammars and avoid backtracking because the latter requires keeping
>> multiple copies of the input stream, and traversing back one token at
>> a time until a rule succeeds, which usually ends up being
>> prohibitively slow. I suppose a naive backtracking combinator in
>> lexgen  could be implemented using something analogous to the
>> bind/rebind combinators, but instead the argument combinator would be
>> applied to the previous copy of the input stream in case of failure.
>> But I suspect this would be too slow to be of any practical use.
>>
>>   Are you absolutely certain that the parentheses in PrefixedName can
>> occur as something other than in escaped characters? Again, requiring
>> left recursion and general backtracking is very unusual for an ABNF
>> grammar, so better to make sure you understand the various PN_ rules.
>> I am happy to help with that, just let me know.
>>
>> -Ivan
>>
>>
>> On Fri, Sep 15, 2017 at 1:45 AM, Nathaniel Rudavsky-Brody
>> <address@hidden> wrote:
>> > Hi Ivan,
>> >
>> > Thanks for replying, and sorry for being unclear. By different levels I
>> > just
>> > mean that I need a non-greedy * for a low-level rule that gets combined
>> > in
>> > different ways higher up. I'm able to write a correct parser by
>> > "flattening
>> > out" the rules, but since it's a big grammar (173 rules) that quickly
>> > gets
>> > messy.
>> >
>> > Here's an excerpt from the real rules. Practically speaking, the problem
>> > is
>> > that by itself, "ex:abc)" is a valid PrefixedName, but in the expression
>> > "(?var) { (ex:abc) }" only "ex:abc" should be parsed as a PrefixedName,
>> > and
>> > the ")" as part of the InlineDataFull.
>> >
>> >
>> > PN_LOCAL_ESC  ::=  '\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" |
>> > '('
>> > | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%' )
>> >
>> > PLX  ::=  PERCENT | PN_LOCAL_ESC
>> >
>> > PN_LOCAL ::=  (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' |
>> > PLX)* (PN_CHARS | ':' | PLX) )?
>> >
>> > PNAME_LN  ::=  PNAME
>> >
>> > PrefixedName  ::=  PNAME_LN | PNAME_NS
>> >
>> > iri  ::=  IRIREF | PrefixedName
>> >
>> > DataBlockValue  ::=  iri | RDFLiteral | NumericLiteral | BooleanLiteral
>> > |
>> > 'UNDEF'
>> >
>> > InlineDataFull  ::=  ( NIL | '(' Var* ')' ) '{' ( '(' DataBlockValue*
>> > ')' |
>> > NIL )* '}'
>> >
>> >
>> >
>> > With my limited understanding of lexgen's internals, I can write a *
>> > that
>> > backtracks if it catches an error from a later success continuation, but
>> > this doesn't compose with something like opt that doesn't signal an
>> > error:
>> >
>> > (define (bstar p)
>> >   (lambda (sk fk strm)
>> >     (let ((try
>> >            (lambda (s)
>> >              (let ((ss (sk s)))
>> >                (if (equal? ss '(error)) #f
>> >                    ss)))))
>> >       (p (lambda (strm1)
>> >            (or
>> >             ((bstar p) try sk strm1)
>> >             (sk strm)))
>> >          sk
>> >          strm))))
>> >
>> >
>> > (lex (seq (bstar (lit "a")) (lit "a")) err "aaaaab")) ; => ((a a a a a)
>> > (b))
>> >
>> > but
>> >
>> > (lex (opt (seq (bstar (lit a)) (lit a))) err "aaaaab") ; =>  (() (a a a
>> > a a
>> > b))
>> >
>> >
>> > Does this make more sense?
>> >
>> > Nathaniel
>> >
>> >
>> > On Thu, Sep 14, 2017 at 9:54 PM, Ivan Raikov <address@hidden>
>> > wrote:
>> >>
>> >> Hi Nathaniel,
>> >>
>> >>     I don't understand what you mean by "different levels" here. Can
>> >> you express your grammar in ABNF notation?
>> >>
>> >>    -Ivan
>> >>
>> >>
>> >> On Thu, Sep 14, 2017 at 4:06 AM, Nathaniel Rudavsky-Brody
>> >> <address@hidden> wrote:
>> >> > Hi,
>> >> >
>> >> > I was wondering if anyone might have some advice for a beginner's
>> >> > question
>> >> > with abnf/lexgen. Clearly it has something to do with backtracking,
>> >> > and
>> >> > I
>> >> > hope I'm just missing something simple -- 1500 lines into my parser,
>> >> > I'd
>> >> > hate to have to start from scratch with another approach!
>> >> >
>> >> >
>> >> > For matching a string of a's and b's that ends with an "a", clearly
>> >> >
>> >> >
>> >> > (lex (seq (star (bar (lit "a") (lit "b")))
>> >> >                (lit "a"))
>> >> >     error "ababa")  ; => error
>> >> >
>> >> >
>> >> > For patterns on the same level, I can do manual backtracking:
>> >> >
>> >> >
>> >> > (define-syntax vac
>> >> >   (syntax-rules ()
>> >> >     ((_ fn) (lambda args (apply fn args)))))
>> >> >
>> >> > (define (left-star-seq p1 p2)   ; p1*p2
>> >> >   (vac
>> >> >    (bar
>> >> >     (seq p1 (left-star-seq p1 p2))
>> >> >     p2)))
>> >> >
>> >> > (lex (left-star-seq
>> >> >           (bar (lit "a") (lit "b"))
>> >> >           (lit "a"))
>> >> >   error "ababa")   ; =>  ((#\a #\b #\a #\b #\a) ())
>> >> >
>> >> >
>> >> > but what if p1 and p2 are on completely different levels, something
>> >> > like:
>> >> >
>> >> >
>> >> > (define p1 (star (bar (lit "a") (lit "b"))))
>> >> > (define p2 (lit "a"))
>> >> > ...
>> >> > (define q2 (seq q1 p1))
>> >> > (define q4 (seq q3 q2))
>> >> > (define Word (seq q4 p2))
>> >> >
>> >> >
>> >> > Is there way to do "composable" backtracking with these tools? (and
>> >> > is
>> >> > the
>> >> > question even the right one?)
>> >> >
>> >> > Thanks for any ideas,
>> >> >
>> >> > Nathaniel
>> >> >
>> >> > _______________________________________________
>> >> > Chicken-users mailing list
>> >> > address@hidden
>> >> > https://lists.nongnu.org/mailman/listinfo/chicken-users
>> >> >
>> >
>> >
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]