[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#73188: PEG parser does not support full PEG grammar
From: |
Ekaitz Zarraga |
Subject: |
bug#73188: PEG parser does not support full PEG grammar |
Date: |
Sun, 20 Oct 2024 22:18:52 +0200 |
Hi Ludovic,
Thanks for the review
On 2024-10-20 12:10, Ludovic Courtès wrote:
+(define (Primary->defn lst for-syntax)
+ (let ((value (second lst)))
+ (case (car value)
+ ('DOT #'peg-any)
+ ('Identifier (Identifier->defn value for-syntax))
+ ('Expression (Expression->defn value for-syntax))
+ ('Literal (Literal->defn value for-syntax))
+ ('Class (Class->defn value for-syntax)))))
I get these compiler warnings:
--8<---------------cut here---------------start------------->8---
ice-9/peg/string-peg.scm:258:7: warning: duplicate datum quote in clause ((quote NOT)
(quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) of
case expression (case (car suffix) ((quote AND) (quasisyntax (followed-by (unsyntax
(Suffix->defn (third lst) for-syntax))))) ((quote NOT) (quasisyntax (not-followed-by
(unsyntax (Suffix->defn (third lst) for-syntax))))) (else (Suffix->defn suffix
for-syntax)))
ice-9/peg/string-peg.scm:277:9: warning: duplicate datum quote in clause
((quote STAR) (quasisyntax (* (unsyntax out)))) of case expression (case (caar
extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR)
(quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax
out)))))
ice-9/peg/string-peg.scm:278:9: warning: duplicate datum quote in clause
((quote PLUS) (quasisyntax (+ (unsyntax out)))) of case expression (case (caar
extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR)
(quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax
out)))))
ice-9/peg/string-peg.scm:284:7: warning: duplicate datum quote in clause ((quote Identifier)
(Identifier->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression)
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax))
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value
for-syntax)))
ice-9/peg/string-peg.scm:285:7: warning: duplicate datum quote in clause ((quote Expression)
(Expression->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression)
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax))
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value
for-syntax)))
ice-9/peg/string-peg.scm:286:7: warning: duplicate datum quote in clause ((quote Literal)
(Literal->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression)
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax))
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value
for-syntax)))
ice-9/peg/string-peg.scm:287:7: warning: duplicate datum quote in clause ((quote NotInClass)
(NotInClass->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression)
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax))
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value
for-syntax)))
ice-9/peg/string-peg.scm:288:7: warning: duplicate datum quote in clause ((quote Class)
(Class->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression)
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax))
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value
for-syntax)))
--8<---------------cut here---------------end--------------->8---
And indeed, the correct syntax is:
(case value (DOT …) (Identifier …) …)
or:
(match value ('DOT …) ('Identifier …) …)
The former returns *unspecified* when passed a value not matched by any
clause, whereas the latter throws an error.
So I would recommend ‘match’.
I'm always in doubt with this but the thing worked.
I always check in the internet how to do the case, and I always get the
warning... my bad!
At any rate, that makes me wonder whether this code path is tested at
all. As written, ‘Primary->defn’ would always return *unspecified*.
Is there a test we could add?
I made some tests and it works, maybe unexpectedly:
scheme@(guile-user)> (define a 'b)
scheme@(guile-user)> (case a ('c 3)('b 1)('d 2))
;;; <stdin>:12:15: warning: duplicate datum quote in clause ((quote b)
1) of case expression (case a ((quote c) 3) ((quote b) 1) ((quote d) 2))
;;; <stdin>:12:21: warning: duplicate datum quote in clause ((quote d)
2) of case expression (case a ((quote c) 3) ((quote b) 1) ((quote d) 2))
$8 = 1
While your proposal doesn't:
scheme@(guile-user)> (case a (c 3)(b 1)(d 2))
While compiling expression:
Syntax error:
unknown file:14:8: case: invalid clause in subform (c 3) of (case a (c
3) (b 1) (d 2))
I tested further and this is what it should be:
scheme@(guile-user)> (case a ((b) 1)((d) 2))
$1 = 1
I used match instead as you proposed and made it all work with no
warnings and better error reporting.
+(define (Range->defn lst for-syntax)
(cond
- ((list? el)
- (cond
- ((eq? (car el) 'peg-literal)
- (peg-literal->defn el for-syntax))
- ((eq? (car el) 'peg-charclass)
- (peg-charclass->defn el for-syntax))
- ((eq? (car el) 'peg-nonterminal)
- (datum->syntax for-syntax (string->symbol (cadr el))))))
- ((string? el)
+ ((= 2 (length lst))
+ (second (second lst)))
+ ((= 3 (length lst))
+ #`(range
+ #,(Char->defn (second lst) for-syntax)
+ #,(Char->defn (third lst) for-syntax)))))
Keep in mind that ‘length’ is O(N), and that car/first, second/cadr are
frowned upon. I would write it as:
(match lst
((x y) y)
((x y z) #`(range …)))
Yeah, I was very bad at `match` when I wrote this thing :(
But! Now I spent some hours with it and rewrote the thing for `match`!
(Ideally with identifiers more meaningful than x, y, and z. :-))
;; Transforms the nonterminals defined in the PEG parser written as a PEG to
the nonterminals defined in the PEG parser written with S-expressions.
(define (grammar-transform x)
@@ -69,7 +77,7 @@
(peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg)
peg-as-peg)))
(tree-map
grammar-transform
- (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))))
+ (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg)
peg-as-peg)))))))
What happened to the ‘grammar’ binding? I can’t see where it was coming
from.
This is a very good catch!
I "fixed" a missing identifier by this, but it happened to be skipping a
really important check: Is our PEG written as PEG understood like our
PEG definition in sexps?
The `Grammar` is coming from the processing of `peg-as-peg` as a
peg-string. This is what I was missing.
This was hiding some errors that I fixed, and now I'm pretty confident
of the thing. Wow! This was a very good one! Thanks for pointing it out.
From 64a17be08581465d11185b4a0ca636354d2f944c Mon Sep 17 00:00:00 2001
From: Ekaitz Zarraga <ekaitz@elenq.tech>
Date: Fri, 11 Oct 2024 14:24:30 +0200
Subject: [PATCH v3 2/2] PEG: Add support for `not-in-range` and [^...]
Modern PEG supports inversed class like `[^a-z]` that would get any
character not in the `a-z` range. This commit adds support for that and
also for a new `not-in-range` PEG pattern for scheme.
* module/ice-9/peg/codegen.scm (cg-not-in-range): New function.
* module/ice-9/peg/string-peg.scm: Add support for `[^...]`
* test-suite/tests/peg.test: Test it.
* doc/ref/api-peg.texi: Document accordingly.
This one LGTM.
In addition to the issues mentioned above, could you add an entry in the
‘NEWS’ file, probably under a new “New interfaces and functionality”
heading?
Yes!
Added it in both commits: In the first commit I added that PEG was
improved, and in the second that `not-in-range` was added.
Thanks,
Ludo’.
Attached new version of both patches.
Thanks for the help, the kind of things you point out are the ones that
I wanted help with! Thanks! Now I'm a better matcher.
This helps me improve,
Ekaitz
v4-0002-PEG-Add-support-for-not-in-range-and.patch
Description: Text Data
v4-0001-PEG-Add-full-support-for-PEG-some-extensions.patch
Description: Text Data