|
From: | Nathaniel Rudavsky-Brody |
Subject: | Re: [Chicken-users] backtracking in abnf/lexgen |
Date: | Mon, 18 Sep 2017 13:32:21 +0200 |
Hi Nathaniel,
No problem, thanks for trying to use lexgen and abnf. The excerpt
from the grammar does not include the rules for PNAME, or PNAME_NS,
but if I understand correctly, you are suggesting that PrefixedName
could include parentheses which could cause the closing parenthesis of
DataBlockValue to be consumed incorrectly. This would indeed require
general backtracking, but it is a bit surprising to me, because
usually ABNF grammars avoid backtracking as it can be very costly in
terms of memory and computation time. For detailed explanation see
e.g. here: http://www.artima.com/pins1ed/combinator-parsing.html#31.10
Parser combinator libraries such as lexgen tend to favor LL
grammars and avoid backtracking because the latter requires keeping
multiple copies of the input stream, and traversing back one token at
a time until a rule succeeds, which usually ends up being
prohibitively slow. I suppose a naive backtracking combinator in
lexgen could be implemented using something analogous to the
bind/rebind combinators, but instead the argument combinator would be
applied to the previous copy of the input stream in case of failure.
But I suspect this would be too slow to be of any practical use.
Are you absolutely certain that the parentheses in PrefixedName can
occur as something other than in escaped characters? Again, requiring
left recursion and general backtracking is very unusual for an ABNF
grammar, so better to make sure you understand the various PN_ rules.
I am happy to help with that, just let me know.
-Ivan
On Fri, Sep 15, 2017 at 1:45 AM, Nathaniel Rudavsky-Brody
<address@hidden> wrote:
> Hi Ivan,
>
> Thanks for replying, and sorry for being unclear. By different levels I just
> mean that I need a non-greedy * for a low-level rule that gets combined in
> different ways higher up. I'm able to write a correct parser by "flattening
> out" the rules, but since it's a big grammar (173 rules) that quickly gets
> messy.
>
> Here's an excerpt from the real rules. Practically speaking, the problem is
> that by itself, "ex:abc)" is a valid PrefixedName, but in the _expression_
> "(?var) { (ex:abc) }" only "ex:abc" should be parsed as a PrefixedName, and
> the ")" as part of the InlineDataFull.
>
>
> PN_LOCAL_ESC ::= '\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '('
> | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%' )
>
> PLX ::= PERCENT | PN_LOCAL_ESC
>
> PN_LOCAL ::= (PN_CHARS_U | ':' | [0-9] | PLX ) ((PN_CHARS | '.' | ':' |
> PLX)* (PN_CHARS | ':' | PLX) )?
>
> PNAME_LN ::= PNAME
>
> PrefixedName ::= PNAME_LN | PNAME_NS
>
> iri ::= IRIREF | PrefixedName
>
> DataBlockValue ::= iri | RDFLiteral | NumericLiteral | BooleanLiteral |
> 'UNDEF'
>
> InlineDataFull ::= ( NIL | '(' Var* ')' ) '{' ( '(' DataBlockValue* ')' |
> NIL )* '}'
>
>
>
> With my limited understanding of lexgen's internals, I can write a * that
> backtracks if it catches an error from a later success continuation, but
> this doesn't compose with something like opt that doesn't signal an error:
>
> (define (bstar p)
> (lambda (sk fk strm)
> (let ((try
> (lambda (s)
> (let ((ss (sk s)))
> (if (equal? ss '(error)) #f
> ss)))))
> (p (lambda (strm1)
> (or
> ((bstar p) try sk strm1)
> (sk strm)))
> sk
> strm))))
>
>
> (lex (seq (bstar (lit "a")) (lit "a")) err "aaaaab")) ; => ((a a a a a) (b))
>
> but
>
> (lex (opt (seq (bstar (lit a)) (lit a))) err "aaaaab") ; => (() (a a a a a
> b))
>
>
> Does this make more sense?
>
> Nathaniel
>
>
> On Thu, Sep 14, 2017 at 9:54 PM, Ivan Raikov <address@hidden>
> wrote:
>>
>> Hi Nathaniel,
>>
>> I don't understand what you mean by "different levels" here. Can
>> you express your grammar in ABNF notation?
>>
>> -Ivan
>>
>>
>> On Thu, Sep 14, 2017 at 4:06 AM, Nathaniel Rudavsky-Brody
>> <address@hidden> wrote:
>> > Hi,
>> >
>> > I was wondering if anyone might have some advice for a beginner's
>> > question
>> > with abnf/lexgen. Clearly it has something to do with backtracking, and
>> > I
>> > hope I'm just missing something simple -- 1500 lines into my parser, I'd
>> > hate to have to start from scratch with another approach!
>> >
>> >
>> > For matching a string of a's and b's that ends with an "a", clearly
>> >
>> >
>> > (lex (seq (star (bar (lit "a") (lit "b")))
>> > (lit "a"))
>> > error "ababa") ; => error
>> >
>> >
>> > For patterns on the same level, I can do manual backtracking:
>> >
>> >
>> > (define-syntax vac
>> > (syntax-rules ()
>> > ((_ fn) (lambda args (apply fn args)))))
>> >
>> > (define (left-star-seq p1 p2) ; p1*p2
>> > (vac
>> > (bar
>> > (seq p1 (left-star-seq p1 p2))
>> > p2)))
>> >
>> > (lex (left-star-seq
>> > (bar (lit "a") (lit "b"))
>> > (lit "a"))
>> > error "ababa") ; => ((#\a #\b #\a #\b #\a) ())
>> >
>> >
>> > but what if p1 and p2 are on completely different levels, something
>> > like:
>> >
>> >
>> > (define p1 (star (bar (lit "a") (lit "b"))))
>> > (define p2 (lit "a"))
>> > ...
>> > (define q2 (seq q1 p1))
>> > (define q4 (seq q3 q2))
>> > (define Word (seq q4 p2))
>> >
>> >
>> > Is there way to do "composable" backtracking with these tools? (and is
>> > the
>> > question even the right one?)
>> >
>> > Thanks for any ideas,
>> >
>> > Nathaniel
>> >
>> > _______________________________________________
>> > Chicken-users mailing list
>> > address@hidden
>> > https://lists.nongnu.org/mailman/listinfo/chicken-users
>> >
>
>
[Prev in Thread] | Current Thread | [Next in Thread] |