bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#73188: PEG parser does not support full PEG grammar


From: Ludovic Courtès
Subject: bug#73188: PEG parser does not support full PEG grammar
Date: Sun, 20 Oct 2024 12:10:47 +0200
User-agent: Gnus/5.13 (Gnus v5.13)

Egun on Ekaitz,

Ekaitz Zarraga <ekaitz@elenq.tech> skribis:

> - What it is true is if people were using `peg-as-peg` for their
>   things, which isn't even exported by the module, their code would
>   have problems. But we can't predict that.

Since ‘peg-as-peg’ has always been private, that’s fine.

> From 81944c2037e0d6d8c972def2ceb1aa4c51a61431 Mon Sep 17 00:00:00 2001
> From: Ekaitz Zarraga <ekaitz@elenq.tech>
> Date: Wed, 11 Sep 2024 21:19:26 +0200
> Subject: [PATCH v3 1/2] PEG: Add full support for PEG + some extensions
>
> This commit adds support for PEG as described in:
>
>     <https://bford.info/pub/lang/peg.pdf>
>
> It adds support for the missing features (comments, underscores in
> identifiers and escaping) while keeping the extensions (dashes in
> identifiers, < and <--).
>
> The naming system tries to be as close as possible to the one proposed
> in the paper.
>
> * module/ice-9/peg/string-peg.scm: Rewrite PEG parser.
> * test-suite/tests/peg.test: Fix import

[...]

> +(define (Primary->defn lst for-syntax)
> +  (let ((value (second lst)))
> +    (case (car value)
> +      ('DOT        #'peg-any)
> +      ('Identifier (Identifier->defn value for-syntax))
> +      ('Expression (Expression->defn value for-syntax))
> +      ('Literal    (Literal->defn value for-syntax))
> +      ('Class      (Class->defn value for-syntax)))))

I get these compiler warnings:

--8<---------------cut here---------------start------------->8---
ice-9/peg/string-peg.scm:258:7: warning: duplicate datum quote in clause 
((quote NOT) (quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) 
for-syntax))))) of case expression (case (car suffix) ((quote AND) (quasisyntax 
(followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) ((quote NOT) 
(quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) 
for-syntax))))) (else (Suffix->defn suffix for-syntax)))
ice-9/peg/string-peg.scm:277:9: warning: duplicate datum quote in clause 
((quote STAR) (quasisyntax (* (unsyntax out)))) of case expression (case (caar 
extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) 
(quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax 
out)))))
ice-9/peg/string-peg.scm:278:9: warning: duplicate datum quote in clause 
((quote PLUS) (quasisyntax (+ (unsyntax out)))) of case expression (case (caar 
extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) 
(quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax 
out)))))
ice-9/peg/string-peg.scm:284:7: warning: duplicate datum quote in clause 
((quote Identifier) (Identifier->defn value for-syntax)) of case expression 
(case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) 
(Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn 
value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote 
NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn 
value for-syntax)))
ice-9/peg/string-peg.scm:285:7: warning: duplicate datum quote in clause 
((quote Expression) (Expression->defn value for-syntax)) of case expression 
(case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) 
(Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn 
value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote 
NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn 
value for-syntax)))
ice-9/peg/string-peg.scm:286:7: warning: duplicate datum quote in clause 
((quote Literal) (Literal->defn value for-syntax)) of case expression (case 
(car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) 
(Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn 
value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote 
NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn 
value for-syntax)))
ice-9/peg/string-peg.scm:287:7: warning: duplicate datum quote in clause 
((quote NotInClass) (NotInClass->defn value for-syntax)) of case expression 
(case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) 
(Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn 
value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote 
NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn 
value for-syntax)))
ice-9/peg/string-peg.scm:288:7: warning: duplicate datum quote in clause 
((quote Class) (Class->defn value for-syntax)) of case expression (case (car 
value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn 
value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) 
((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) 
(NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value 
for-syntax)))
--8<---------------cut here---------------end--------------->8---

And indeed, the correct syntax is:

  (case value (DOT …) (Identifier …) …)

or:

  (match value ('DOT …) ('Identifier …) …)

The former returns *unspecified* when passed a value not matched by any
clause, whereas the latter throws an error.

So I would recommend ‘match’.

At any rate, that makes me wonder whether this code path is tested at
all.  As written, ‘Primary->defn’ would always return *unspecified*.
Is there a test we could add?

> +(define (Range->defn lst for-syntax)
>    (cond
> -   ((list? el)
> -    (cond
> -     ((eq? (car el) 'peg-literal)
> -      (peg-literal->defn el for-syntax))
> -     ((eq? (car el) 'peg-charclass)
> -      (peg-charclass->defn el for-syntax))
> -     ((eq? (car el) 'peg-nonterminal)
> -      (datum->syntax for-syntax (string->symbol (cadr el))))))
> -   ((string? el)
> +    ((= 2 (length lst))
> +     (second (second lst)))
> +    ((= 3 (length lst))
> +     #`(range
> +         #,(Char->defn (second lst) for-syntax)
> +         #,(Char->defn (third lst) for-syntax)))))

Keep in mind that ‘length’ is O(N), and that car/first, second/cadr are
frowned upon.  I would write it as:

  (match lst
    ((x y) y)
    ((x y z) #`(range …)))

(Ideally with identifiers more meaningful than x, y, and z. :-))

>  ;; Transforms the nonterminals defined in the PEG parser written as a PEG to 
> the nonterminals defined in the PEG parser written with S-expressions.
>  (define (grammar-transform x)
> @@ -69,7 +77,7 @@
>      (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) 
> peg-as-peg)))
>      (tree-map
>       grammar-transform
> -     (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))))
> +     (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) 
> peg-as-peg)))))))

What happened to the ‘grammar’ binding?  I can’t see where it was coming
from.

> From 64a17be08581465d11185b4a0ca636354d2f944c Mon Sep 17 00:00:00 2001
> From: Ekaitz Zarraga <ekaitz@elenq.tech>
> Date: Fri, 11 Oct 2024 14:24:30 +0200
> Subject: [PATCH v3 2/2] PEG: Add support for `not-in-range` and [^...]
>
> Modern PEG supports inversed class like `[^a-z]` that would get any
> character not in the `a-z` range. This commit adds support for that and
> also for a new `not-in-range` PEG pattern for scheme.
>
> * module/ice-9/peg/codegen.scm (cg-not-in-range): New function.
> * module/ice-9/peg/string-peg.scm: Add support for `[^...]`
> * test-suite/tests/peg.test: Test it.
> * doc/ref/api-peg.texi: Document accordingly.

This one LGTM.

In addition to the issues mentioned above, could you add an entry in the
‘NEWS’ file, probably under a new “New interfaces and functionality”
heading?

Thanks,
Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]