[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] backtracking in abnf/lexgen
From: |
Ivan Raikov |
Subject: |
Re: [Chicken-users] backtracking in abnf/lexgen |
Date: |
Mon, 18 Sep 2017 14:29:18 -0700 |
Nevermind, I see now that the last character of PN_PREFIX can be
consumed as part of the repetition rule, and not as the PN_CHARS
In that case, I would still suggest the bind combinator:
(define (make-prefixed-name char-lst)
;; check that char-lst ends with a PN_CHAR and create a prefixed
name object with e.g. string->symbol
(define PN_PREFIX
(bind
make-prefixed-name
(concatenation
PN_CHARS_BASE
(optional-sequence
(repetition
(alternatives PN_CHARS (char-list/list ".")))
))
))
On Mon, Sep 18, 2017 at 2:06 PM, Ivan Raikov <address@hidden> wrote:
> Hi Nathaniel,
>
> It seems to me that the rules PN_PREFIX and PN_LOCAL already exclude
> strings ending with a '.'; e.g. in the case of PN_PREFIX, it must end
> with PN_CHARS_BASE or PN_CHARS. I think that the remark about LL1
> means that the various PN rules are just regular expressions. So in
> the case of PN_PREFIX I would think that this would work:
>
> (define PN_PREFIX
> (concatenation
> PN_CHARS_BASE
> (optional-sequence
> (concatenation
> (repetition
> (alternatives PN_CHARS (char-list/list ".")))
> PN_CHARS))
> ))
>
> In other words, a PN_PREFIX either consists of PN_CHARS_BASE, or
> PN_CHARSE_BASE followed by an arbitrary combination of PN_CHARS and
> '.', followed by PN_CHARS.
> Is there a counterexample to this?
>
> On Mon, Sep 18, 2017 at 4:32 AM, Nathaniel Rudavsky-Brody
> <address@hidden> wrote:
>> Hi Ivan,
>>
>> Thanks a lot for your explanation. It helped me find the key I'd been
>> missing: "The SPARQL grammar is LL(1) when the rules with uppercased names
>> are used as terminals." So the uppercased names *do* need some kind of
>> backtracking.
>>
>> My example with the parentheses was a bit silly (indeed I'd misread the
>> grammar slightly), but I was trying to illustrate the basic problem which
>> still holds: PN_PREFIX and PN_LOCAL can't end with ".":
>>
>> PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?
>>
>> PN_LOCAL ::= (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' |
>> PLX)* (PN_CHARS | ':' | PLX) )?
>>
>> (from https://www.w3.org/TR/sparql11-query/#sparqlGrammar)
>>
>> As I mentioned, I was able to write a simple backtracking star to parse
>> these, but it didn't compose well with bar, since bar doesn't send back the
>> error which is needed to trigger the backtracking. The best I've been able
>> to get working is a "sandbox" combinator (code below) that limits the scope
>> of the backtracking, so I don't have to modify anything else in abnf/lexgen
>> or my parser. Does it look correct to you?
>>
>> I hope that since these are the only two places in the grammar that require
>> backtracking, and I'm able to isolate it, that the performance hit won't be
>> to big...?
>>
>> Nathaniel
>>
>>
>>
>>
>>
>>
>> (define (bstar p)
>> (lambda (sk fk strm)
>> (let ((try
>> (lambda (s)
>> (let ((ss (sk s)))
>> (if (equal? ss '(error)) #f
>> ss)))))
>> (p (lambda (strm1)
>> (or
>> ((bstar p) try try strm1)
>> (sk strm)))
>> sk
>> strm))))
>>
>> (define (sandbox p)
>> (lambda (sk fk strm)
>> (let ((s (p values err strm)))
>> (if (equal? s '(error))
>> (fk strm)
>> (sk s)))))
>>
>> (define PN_PREFIX
>> (concatenation
>> PN_CHARS_BASE
>> (optional-sequence
>> (sandbox
>> (concatenation
>> (bstar
>> (alternatives
>> (char-list/lit ".")
>> PN_CHARS))
>> PN_CHARS)))))
>>
>>
>>
>>
>> On Fri, Sep 15, 2017 at 5:42 PM, Ivan Raikov <address@hidden>
>> wrote:
>>>
>>> Hi Nathaniel,
>>>
>>> No problem, thanks for trying to use lexgen and abnf. The excerpt
>>> from the grammar does not include the rules for PNAME, or PNAME_NS,
>>> but if I understand correctly, you are suggesting that PrefixedName
>>> could include parentheses which could cause the closing parenthesis of
>>> DataBlockValue to be consumed incorrectly. This would indeed require
>>> general backtracking, but it is a bit surprising to me, because
>>> usually ABNF grammars avoid backtracking as it can be very costly in
>>> terms of memory and computation time. For detailed explanation see
>>> e.g. here: http://www.artima.com/pins1ed/combinator-parsing.html#31.10
>>>
>>> Parser combinator libraries such as lexgen tend to favor LL
>>> grammars and avoid backtracking because the latter requires keeping
>>> multiple copies of the input stream, and traversing back one token at
>>> a time until a rule succeeds, which usually ends up being
>>> prohibitively slow. I suppose a naive backtracking combinator in
>>> lexgen could be implemented using something analogous to the
>>> bind/rebind combinators, but instead the argument combinator would be
>>> applied to the previous copy of the input stream in case of failure.
>>> But I suspect this would be too slow to be of any practical use.
>>>
>>> Are you absolutely certain that the parentheses in PrefixedName can
>>> occur as something other than in escaped characters? Again, requiring
>>> left recursion and general backtracking is very unusual for an ABNF
>>> grammar, so better to make sure you understand the various PN_ rules.
>>> I am happy to help with that, just let me know.
>>>
>>> -Ivan
>>>
>>>
>>> On Fri, Sep 15, 2017 at 1:45 AM, Nathaniel Rudavsky-Brody
>>> <address@hidden> wrote:
>>> > Hi Ivan,
>>> >
>>> > Thanks for replying, and sorry for being unclear. By different levels I
>>> > just
>>> > mean that I need a non-greedy * for a low-level rule that gets combined
>>> > in
>>> > different ways higher up. I'm able to write a correct parser by
>>> > "flattening
>>> > out" the rules, but since it's a big grammar (173 rules) that quickly
>>> > gets
>>> > messy.
>>> >
>>> > Here's an excerpt from the real rules. Practically speaking, the problem
>>> > is
>>> > that by itself, "ex:abc)" is a valid PrefixedName, but in the expression
>>> > "(?var) { (ex:abc) }" only "ex:abc" should be parsed as a PrefixedName,
>>> > and
>>> > the ")" as part of the InlineDataFull.
>>> >
>>> >
>>> > PN_LOCAL_ESC ::= '\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" |
>>> > '('
>>> > | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%' )
>>> >
>>> > PLX ::= PERCENT | PN_LOCAL_ESC
>>> >
>>> > PN_LOCAL ::= (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' |
>>> > PLX)* (PN_CHARS | ':' | PLX) )?
>>> >
>>> > PNAME_LN ::= PNAME
>>> >
>>> > PrefixedName ::= PNAME_LN | PNAME_NS
>>> >
>>> > iri ::= IRIREF | PrefixedName
>>> >
>>> > DataBlockValue ::= iri | RDFLiteral | NumericLiteral | BooleanLiteral
>>> > |
>>> > 'UNDEF'
>>> >
>>> > InlineDataFull ::= ( NIL | '(' Var* ')' ) '{' ( '(' DataBlockValue*
>>> > ')' |
>>> > NIL )* '}'
>>> >
>>> >
>>> >
>>> > With my limited understanding of lexgen's internals, I can write a *
>>> > that
>>> > backtracks if it catches an error from a later success continuation, but
>>> > this doesn't compose with something like opt that doesn't signal an
>>> > error:
>>> >
>>> > (define (bstar p)
>>> > (lambda (sk fk strm)
>>> > (let ((try
>>> > (lambda (s)
>>> > (let ((ss (sk s)))
>>> > (if (equal? ss '(error)) #f
>>> > ss)))))
>>> > (p (lambda (strm1)
>>> > (or
>>> > ((bstar p) try sk strm1)
>>> > (sk strm)))
>>> > sk
>>> > strm))))
>>> >
>>> >
>>> > (lex (seq (bstar (lit "a")) (lit "a")) err "aaaaab")) ; => ((a a a a a)
>>> > (b))
>>> >
>>> > but
>>> >
>>> > (lex (opt (seq (bstar (lit a)) (lit a))) err "aaaaab") ; => (() (a a a
>>> > a a
>>> > b))
>>> >
>>> >
>>> > Does this make more sense?
>>> >
>>> > Nathaniel
>>> >
>>> >
>>> > On Thu, Sep 14, 2017 at 9:54 PM, Ivan Raikov <address@hidden>
>>> > wrote:
>>> >>
>>> >> Hi Nathaniel,
>>> >>
>>> >> I don't understand what you mean by "different levels" here. Can
>>> >> you express your grammar in ABNF notation?
>>> >>
>>> >> -Ivan
>>> >>
>>> >>
>>> >> On Thu, Sep 14, 2017 at 4:06 AM, Nathaniel Rudavsky-Brody
>>> >> <address@hidden> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I was wondering if anyone might have some advice for a beginner's
>>> >> > question
>>> >> > with abnf/lexgen. Clearly it has something to do with backtracking,
>>> >> > and
>>> >> > I
>>> >> > hope I'm just missing something simple -- 1500 lines into my parser,
>>> >> > I'd
>>> >> > hate to have to start from scratch with another approach!
>>> >> >
>>> >> >
>>> >> > For matching a string of a's and b's that ends with an "a", clearly
>>> >> >
>>> >> >
>>> >> > (lex (seq (star (bar (lit "a") (lit "b")))
>>> >> > (lit "a"))
>>> >> > error "ababa") ; => error
>>> >> >
>>> >> >
>>> >> > For patterns on the same level, I can do manual backtracking:
>>> >> >
>>> >> >
>>> >> > (define-syntax vac
>>> >> > (syntax-rules ()
>>> >> > ((_ fn) (lambda args (apply fn args)))))
>>> >> >
>>> >> > (define (left-star-seq p1 p2) ; p1*p2
>>> >> > (vac
>>> >> > (bar
>>> >> > (seq p1 (left-star-seq p1 p2))
>>> >> > p2)))
>>> >> >
>>> >> > (lex (left-star-seq
>>> >> > (bar (lit "a") (lit "b"))
>>> >> > (lit "a"))
>>> >> > error "ababa") ; => ((#\a #\b #\a #\b #\a) ())
>>> >> >
>>> >> >
>>> >> > but what if p1 and p2 are on completely different levels, something
>>> >> > like:
>>> >> >
>>> >> >
>>> >> > (define p1 (star (bar (lit "a") (lit "b"))))
>>> >> > (define p2 (lit "a"))
>>> >> > ...
>>> >> > (define q2 (seq q1 p1))
>>> >> > (define q4 (seq q3 q2))
>>> >> > (define Word (seq q4 p2))
>>> >> >
>>> >> >
>>> >> > Is there way to do "composable" backtracking with these tools? (and
>>> >> > is
>>> >> > the
>>> >> > question even the right one?)
>>> >> >
>>> >> > Thanks for any ideas,
>>> >> >
>>> >> > Nathaniel
>>> >> >
>>> >> > _______________________________________________
>>> >> > Chicken-users mailing list
>>> >> > address@hidden
>>> >> > https://lists.nongnu.org/mailman/listinfo/chicken-users
>>> >> >
>>> >
>>> >
>>
>>