emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fixing ill-conditioned regular expressions. Proof of concept.


From: Stefan Monnier
Subject: Re: Fixing ill-conditioned regular expressions. Proof of concept.
Date: Fri, 13 Mar 2015 18:53:38 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> ;;
> ;; f i x - r e . e l

This is not using the standard file format.  I strongly suggest you use
something like auto-insert-mode which will get that right for you.

> ;; The AST is an internal representation of the regular expression string.
> ;; Elements are represented as follows:

Why invent a new format, instead of using rx?  I'm pretty sure someone
has already provided a parser from string to rx format (and of course
we also already have a dumper from rx to string).

> ;; PTR and AD
> ;; ----------
> ;; Whilst building and transforming trees, what we really need is pointers to
> ;; either the car or cdr of a cons cell, to enable us to
> ;; overwrite them.

I think your code will gain a lot in readability if you made it
functional: take an rx expression and return a *new* rx expression
without modifying the original one.

As it stands, I find it rather impenetrable.

> Return 'shortened, t or nil.

Our conventions say this should be

  Return `shortened', t or nil.

>   (if (eq ad 'cdr) (error "fix-re--R+R*->R+ got 'cdr"))
>   (let ((e0 (if (eq ad 'car) (car ptr) (cadr ptr)))
>       (e1 (if (eq ad 'car) (cadr ptr) (caddr ptr))))
>     (when (and (consp e0) (consp e1))
>       (let ((op0 (car e0))
>           (op1 (car e1))
>           ;; (link (if (eq ad 'car) (cddr ptr) (cdddr ptr)))
>           op
>           )
>       (when
>           (and (memq op0 '(+ * \?))
>                (memq op1 '(+ * \?))
>                (equal (cdr e0) (cdr e1))) ; Both Rs are the same.
>         (cond
>          ((and (eq op0 '\?) (eq op1 '\?)) ; Cant combine R?R?
>           nil)
>          ((and (eq op0 '+) (eq op1 '+)) ; R+R+ -> RR+
>           (fix-re--chop-+* ptr 'car)
>           t)
>          (t
>           (setq op
>                   (cond
>                    ((or (eq op0 '+) (eq op1 '+)) '+)
>                    ((or (eq op0 '*) (eq op1 '*)) '*)))
>           (setcar e0 op)
>           (fix-re--chop (if (eq ad 'car) ptr (cdr ptr)) 'cadr)
>           'shortened))
>         )))))

For example, I think that if you make it functional rather than
imperative, the above could turn into something along the lines of

  (pcase ptr                ;We don't have `ad' any more.
    (`((,op0 . ,r0)
       (,op1 . ,r1)
       . (and ,rest
              (guard (and (equal r0 r1)
                          (memq op0 '(+ * \?))
                          (memq op1 '(+ * \?)))))))
     (cond
      ((and (eq op0 '\?) (eq op1 '\?)) ; Cant combine R?R?
       ptr)
      ((and (eq op0 '+) (eq op1 '+)) ; R+R+ -> RR+
       `(,(car r0) ,(car ptr) . ,rest))
      (t
       `((,(cond
               ((or (eq op0 '+) (eq op1 '+)) '+)
               ((or (eq op0 '*) (eq op1 '*)) '*)) . r0)
         . ,rest)))))


-- Stefan>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]