bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#73188: PEG parser does not support full PEG grammar


From: Ekaitz Zarraga
Subject: bug#73188: PEG parser does not support full PEG grammar
Date: Sun, 20 Oct 2024 22:18:52 +0200

Hi Ludovic,

Thanks for the review

On 2024-10-20 12:10, Ludovic Courtès wrote:

+(define (Primary->defn lst for-syntax)
+  (let ((value (second lst)))
+    (case (car value)
+      ('DOT        #'peg-any)
+      ('Identifier (Identifier->defn value for-syntax))
+      ('Expression (Expression->defn value for-syntax))
+      ('Literal    (Literal->defn value for-syntax))
+      ('Class      (Class->defn value for-syntax)))))

I get these compiler warnings:

--8<---------------cut here---------------start------------->8---
ice-9/peg/string-peg.scm:258:7: warning: duplicate datum quote in clause ((quote NOT) 
(quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) of 
case expression (case (car suffix) ((quote AND) (quasisyntax (followed-by (unsyntax 
(Suffix->defn (third lst) for-syntax))))) ((quote NOT) (quasisyntax (not-followed-by 
(unsyntax (Suffix->defn (third lst) for-syntax))))) (else (Suffix->defn suffix 
for-syntax)))
ice-9/peg/string-peg.scm:277:9: warning: duplicate datum quote in clause 
((quote STAR) (quasisyntax (* (unsyntax out)))) of case expression (case (caar 
extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) 
(quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax 
out)))))
ice-9/peg/string-peg.scm:278:9: warning: duplicate datum quote in clause 
((quote PLUS) (quasisyntax (+ (unsyntax out)))) of case expression (case (caar 
extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) 
(quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax 
out)))))
ice-9/peg/string-peg.scm:284:7: warning: duplicate datum quote in clause ((quote Identifier) 
(Identifier->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax 
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) 
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) 
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value 
for-syntax)))
ice-9/peg/string-peg.scm:285:7: warning: duplicate datum quote in clause ((quote Expression) 
(Expression->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax 
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) 
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) 
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value 
for-syntax)))
ice-9/peg/string-peg.scm:286:7: warning: duplicate datum quote in clause ((quote Literal) 
(Literal->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax 
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) 
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) 
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value 
for-syntax)))
ice-9/peg/string-peg.scm:287:7: warning: duplicate datum quote in clause ((quote NotInClass) 
(NotInClass->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax 
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) 
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) 
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value 
for-syntax)))
ice-9/peg/string-peg.scm:288:7: warning: duplicate datum quote in clause ((quote Class) 
(Class->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax 
peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) 
(Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) 
((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value 
for-syntax)))
--8<---------------cut here---------------end--------------->8---

And indeed, the correct syntax is:

   (case value (DOT …) (Identifier …) …)

or:

   (match value ('DOT …) ('Identifier …) …)

The former returns *unspecified* when passed a value not matched by any
clause, whereas the latter throws an error.

So I would recommend ‘match’.

I'm always in doubt with this but the thing worked.
I always check in the internet how to do the case, and I always get the warning... my bad!

At any rate, that makes me wonder whether this code path is tested at
all.  As written, ‘Primary->defn’ would always return *unspecified*.
Is there a test we could add?

I made some tests and it works, maybe unexpectedly:

        scheme@(guile-user)> (define a 'b)
        scheme@(guile-user)> (case a ('c 3)('b 1)('d 2))
;;; <stdin>:12:15: warning: duplicate datum quote in clause ((quote b) 1) of case expression (case a ((quote c) 3) ((quote b) 1) ((quote d) 2)) ;;; <stdin>:12:21: warning: duplicate datum quote in clause ((quote d) 2) of case expression (case a ((quote c) 3) ((quote b) 1) ((quote d) 2))
        $8 = 1

While your proposal doesn't:
        
        scheme@(guile-user)> (case a (c 3)(b 1)(d 2))
        While compiling expression:
        Syntax error:
unknown file:14:8: case: invalid clause in subform (c 3) of (case a (c 3) (b 1) (d 2))

I tested further and this is what it should be:

        scheme@(guile-user)> (case a ((b) 1)((d) 2))
        $1 = 1

I used match instead as you proposed and made it all work with no warnings and better error reporting.

+(define (Range->defn lst for-syntax)
    (cond
-   ((list? el)
-    (cond
-     ((eq? (car el) 'peg-literal)
-      (peg-literal->defn el for-syntax))
-     ((eq? (car el) 'peg-charclass)
-      (peg-charclass->defn el for-syntax))
-     ((eq? (car el) 'peg-nonterminal)
-      (datum->syntax for-syntax (string->symbol (cadr el))))))
-   ((string? el)
+    ((= 2 (length lst))
+     (second (second lst)))
+    ((= 3 (length lst))
+     #`(range
+         #,(Char->defn (second lst) for-syntax)
+         #,(Char->defn (third lst) for-syntax)))))

Keep in mind that ‘length’ is O(N), and that car/first, second/cadr are
frowned upon.  I would write it as:

   (match lst
     ((x y) y)
     ((x y z) #`(range …)))

Yeah, I was very bad at `match` when I wrote this thing :(
But! Now I spent some hours with it and rewrote the thing for `match`!

(Ideally with identifiers more meaningful than x, y, and z. :-))

  ;; Transforms the nonterminals defined in the PEG parser written as a PEG to 
the nonterminals defined in the PEG parser written with S-expressions.
  (define (grammar-transform x)
@@ -69,7 +77,7 @@
      (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) 
peg-as-peg)))
      (tree-map
       grammar-transform
-     (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))))
+     (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) 
peg-as-peg)))))))

What happened to the ‘grammar’ binding?  I can’t see where it was coming
from.

This is a very good catch!
I "fixed" a missing identifier by this, but it happened to be skipping a really important check: Is our PEG written as PEG understood like our PEG definition in sexps?

The `Grammar` is coming from the processing of `peg-as-peg` as a peg-string. This is what I was missing.

This was hiding some errors that I fixed, and now I'm pretty confident of the thing. Wow! This was a very good one! Thanks for pointing it out.

 From 64a17be08581465d11185b4a0ca636354d2f944c Mon Sep 17 00:00:00 2001
From: Ekaitz Zarraga <ekaitz@elenq.tech>
Date: Fri, 11 Oct 2024 14:24:30 +0200
Subject: [PATCH v3 2/2] PEG: Add support for `not-in-range` and [^...]

Modern PEG supports inversed class like `[^a-z]` that would get any
character not in the `a-z` range. This commit adds support for that and
also for a new `not-in-range` PEG pattern for scheme.

* module/ice-9/peg/codegen.scm (cg-not-in-range): New function.
* module/ice-9/peg/string-peg.scm: Add support for `[^...]`
* test-suite/tests/peg.test: Test it.
* doc/ref/api-peg.texi: Document accordingly.

This one LGTM.

In addition to the issues mentioned above, could you add an entry in the
‘NEWS’ file, probably under a new “New interfaces and functionality”
heading?

Yes!
Added it in both commits: In the first commit I added that PEG was improved, and in the second that `not-in-range` was added.

Thanks,
Ludo’.

Attached new version of both patches.

Thanks for the help, the kind of things you point out are the ones that I wanted help with! Thanks! Now I'm a better matcher.

This helps me improve,
Ekaitz

Attachment: v4-0002-PEG-Add-support-for-not-in-range-and.patch
Description: Text Data

Attachment: v4-0001-PEG-Add-full-support-for-PEG-some-extensions.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]