[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#73188: PEG parser does not support full PEG grammar
From: |
Ludovic Courtès |
Subject: |
bug#73188: PEG parser does not support full PEG grammar |
Date: |
Sun, 20 Oct 2024 12:10:47 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Egun on Ekaitz,
Ekaitz Zarraga <ekaitz@elenq.tech> skribis:
> - What it is true is if people were using `peg-as-peg` for their
> things, which isn't even exported by the module, their code would
> have problems. But we can't predict that.
Since ‘peg-as-peg’ has always been private, that’s fine.
> From 81944c2037e0d6d8c972def2ceb1aa4c51a61431 Mon Sep 17 00:00:00 2001
> From: Ekaitz Zarraga <ekaitz@elenq.tech>
> Date: Wed, 11 Sep 2024 21:19:26 +0200
> Subject: [PATCH v3 1/2] PEG: Add full support for PEG + some extensions
>
> This commit adds support for PEG as described in:
>
> <https://bford.info/pub/lang/peg.pdf>
>
> It adds support for the missing features (comments, underscores in
> identifiers and escaping) while keeping the extensions (dashes in
> identifiers, < and <--).
>
> The naming system tries to be as close as possible to the one proposed
> in the paper.
>
> * module/ice-9/peg/string-peg.scm: Rewrite PEG parser.
> * test-suite/tests/peg.test: Fix import
[...]
> +(define (Primary->defn lst for-syntax)
> + (let ((value (second lst)))
> + (case (car value)
> + ('DOT #'peg-any)
> + ('Identifier (Identifier->defn value for-syntax))
> + ('Expression (Expression->defn value for-syntax))
> + ('Literal (Literal->defn value for-syntax))
> + ('Class (Class->defn value for-syntax)))))
I get these compiler warnings:
--8<---------------cut here---------------start------------->8---
ice-9/peg/string-peg.scm:258:7: warning: duplicate datum quote in clause
((quote NOT) (quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst)
for-syntax))))) of case expression (case (car suffix) ((quote AND) (quasisyntax
(followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) ((quote NOT)
(quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst)
for-syntax))))) (else (Suffix->defn suffix for-syntax)))
ice-9/peg/string-peg.scm:277:9: warning: duplicate datum quote in clause
((quote STAR) (quasisyntax (* (unsyntax out)))) of case expression (case (caar
extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR)
(quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax
out)))))
ice-9/peg/string-peg.scm:278:9: warning: duplicate datum quote in clause
((quote PLUS) (quasisyntax (+ (unsyntax out)))) of case expression (case (caar
extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR)
(quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax
out)))))
ice-9/peg/string-peg.scm:284:7: warning: duplicate datum quote in clause
((quote Identifier) (Identifier->defn value for-syntax)) of case expression
(case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier)
(Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn
value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote
NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn
value for-syntax)))
ice-9/peg/string-peg.scm:285:7: warning: duplicate datum quote in clause
((quote Expression) (Expression->defn value for-syntax)) of case expression
(case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier)
(Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn
value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote
NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn
value for-syntax)))
ice-9/peg/string-peg.scm:286:7: warning: duplicate datum quote in clause
((quote Literal) (Literal->defn value for-syntax)) of case expression (case
(car value) ((quote DOT) (syntax peg-any)) ((quote Identifier)
(Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn
value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote
NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn
value for-syntax)))
ice-9/peg/string-peg.scm:287:7: warning: duplicate datum quote in clause
((quote NotInClass) (NotInClass->defn value for-syntax)) of case expression
(case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier)
(Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn
value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote
NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn
value for-syntax)))
ice-9/peg/string-peg.scm:288:7: warning: duplicate datum quote in clause
((quote Class) (Class->defn value for-syntax)) of case expression (case (car
value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn
value for-syntax)) ((quote Expression) (Expression->defn value for-syntax))
((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass)
(NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value
for-syntax)))
--8<---------------cut here---------------end--------------->8---
And indeed, the correct syntax is:
(case value (DOT …) (Identifier …) …)
or:
(match value ('DOT …) ('Identifier …) …)
The former returns *unspecified* when passed a value not matched by any
clause, whereas the latter throws an error.
So I would recommend ‘match’.
At any rate, that makes me wonder whether this code path is tested at
all. As written, ‘Primary->defn’ would always return *unspecified*.
Is there a test we could add?
> +(define (Range->defn lst for-syntax)
> (cond
> - ((list? el)
> - (cond
> - ((eq? (car el) 'peg-literal)
> - (peg-literal->defn el for-syntax))
> - ((eq? (car el) 'peg-charclass)
> - (peg-charclass->defn el for-syntax))
> - ((eq? (car el) 'peg-nonterminal)
> - (datum->syntax for-syntax (string->symbol (cadr el))))))
> - ((string? el)
> + ((= 2 (length lst))
> + (second (second lst)))
> + ((= 3 (length lst))
> + #`(range
> + #,(Char->defn (second lst) for-syntax)
> + #,(Char->defn (third lst) for-syntax)))))
Keep in mind that ‘length’ is O(N), and that car/first, second/cadr are
frowned upon. I would write it as:
(match lst
((x y) y)
((x y z) #`(range …)))
(Ideally with identifiers more meaningful than x, y, and z. :-))
> ;; Transforms the nonterminals defined in the PEG parser written as a PEG to
> the nonterminals defined in the PEG parser written with S-expressions.
> (define (grammar-transform x)
> @@ -69,7 +77,7 @@
> (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg)
> peg-as-peg)))
> (tree-map
> grammar-transform
> - (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))))
> + (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg)
> peg-as-peg)))))))
What happened to the ‘grammar’ binding? I can’t see where it was coming
from.
> From 64a17be08581465d11185b4a0ca636354d2f944c Mon Sep 17 00:00:00 2001
> From: Ekaitz Zarraga <ekaitz@elenq.tech>
> Date: Fri, 11 Oct 2024 14:24:30 +0200
> Subject: [PATCH v3 2/2] PEG: Add support for `not-in-range` and [^...]
>
> Modern PEG supports inversed class like `[^a-z]` that would get any
> character not in the `a-z` range. This commit adds support for that and
> also for a new `not-in-range` PEG pattern for scheme.
>
> * module/ice-9/peg/codegen.scm (cg-not-in-range): New function.
> * module/ice-9/peg/string-peg.scm: Add support for `[^...]`
> * test-suite/tests/peg.test: Test it.
> * doc/ref/api-peg.texi: Document accordingly.
This one LGTM.
In addition to the issues mentioned above, could you add an entry in the
‘NEWS’ file, probably under a new “New interfaces and functionality”
heading?
Thanks,
Ludo’.