help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: why emacs lisp's regex has 2-steps escapes?


From: Alan Mackenzie
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Thu, 10 Jul 2008 09:39:33 +0000
User-agent: Mutt/1.5.9i

On Wed, Jul 09, 2008 at 03:30:27AM -0700, Xah wrote:
> emacs regex has a odd pecularity in that it needs a lot backslashes.
> More specifically, a string first needs to be properly escaped, then
> this passed to the regex engine.

Yes.  The greatest number of consecutive backslashes I've seen (in a
non-joke context) is 10.

> For example, suppose you have this text ???Sin[x] + Sin[y]??? and you need
> to capture the x or y.

Ironically, Xah, you are doing the same sort of thing in your post,
using crazy quote characters (if that is indeed what they are), 0x5397c
and 0x5397d (according to C-u C-x =).  Over my SSH link to my SSP, your
quotes look something like "â~@~]", and are most difficult to read
without a pair of sunspecs which filters out the UTF.

Could you, perhaps, use the standard ASCII quotes 0x22 and 0x27 here,
please?

> In emacs i need to use
> ???\\(\\[[a-z]\\]\\)???
> for the actual regex
> ???\(\[[a-z]\]\)???.

> Here's somewhat typical but long regex for matching a html image tag

> (search-forward-regexp "<img +src=\"\\([^\"]+\\)\" +alt=\"\\([^\"]+\\)?
> \" +width=\"\\([0-9]+\\)\" +height=\"\\([0-9]+\\)\" ?>" nil t)

> The toothpick syndrom gets crazy making already difficult regex syntax
> impossible to read and hard to code.

> My question is, why is elisp's regex has this 2-steps process? Is this
> some design decision or just happened that way historically?

> Second question: can't elisp create some like ???regex-string??? wrapper
> function that automatically takes care of the quoting? I can't see how
> this might be difficult?

Well, I've hacked up a function to display regexps in *scratch*,
concentrating in particular on deeply nested \( .... \| .... \)
constructs.  It doesn't work so well when the regexp's length exceeds
the window width, but it could be enhanced:

#########################################################################
(defun translate-rnt (regexp)
  "REGEXP is a string.  Translate any \t \n \r and \f characters
to wierd non-ASCII printable characters: \t to Î (206, \xCE), \n
to ñ (241, \xF1), \r to ® (174, \xAE) and \f to £ (163, \xA3).
The original string is modified."
  (let (pos)
    (while (setq pos (string-match "[\t\n\r\f]" regexp))
      (setq ch (aref regexp pos))
      (aset regexp pos
            (cond ((eq ch ?\t) ?Î)
                  ((eq ch ?\n) ?ñ)
                  ((eq ch ?\r) ?®)
                  (t           ?£))))
    regexp))

(defun pp-regexp (regexp)
  "Pretty print a regexp.  This means, contents of \\\\\(s are lowered a line."
  (or (stringp regexp) (error "parameter is not a string."))
  (let ((depth 0)
        (re (copy-sequence regexp))
        (start 0)     ; earliest position still without an acm-depth property.
        (pos 0)       ; current analysis position.
        (max-depth 0) ; How many lines do we need to print?
        (min-depth 0) ; Pick up "negative depth" errors.
        pr-line       ; output line being constructed
        line-no ; line number of pr-line, varies between min-depth and 
max-depth.
        )
    (translate-rnt re)
    ;; apply acm-depth properties to the whole string.
    (while (< start (length re))
      (setq pos (string-match "\\\\\\((\\(\\?:\\)?\\||\\|)\\)"
                                  re start))
      (put-text-property start (or pos (length re)) 'acm-depth depth re)
      (when pos
        (setq ch (aref (match-string 1 re) 0))
        (cond
         ((eq ch ?\()
          (put-text-property pos (match-end 1) 'acm-depth depth re)
          (setq depth (1+ depth))
          (if (> depth max-depth) (setq max-depth depth)))

         ((eq ch ?\|)
          (put-text-property pos (match-end 1) 'acm-depth (1- depth) re)
          (if (< (1- depth) min-depth) (setq min-depth (1- depth))))

         (t                             ; (eq ch ?\))
          (setq depth (1- depth))
          (if (< depth min-depth) (setq min-depth depth))
          (put-text-property pos (match-end 1) 'acm-depth depth re))))
      (setq start (if pos (match-end 1) (length re))))

    ;; print out the strings
    (setq line-no min-depth)
    (while (<= line-no max-depth)
      (with-current-buffer "*scratch*"
        (goto-char (point-max)) (insert ?\n)
        (setq pr-line "")
        (setq start 0)
        (while (< start (length re))
          (setq pos (next-single-property-change start 'acm-depth re (length 
re)))
          (setq depth (get-text-property start 'acm-depth re))
          (setq pr-line
                (concat pr-line
                        (if (= depth line-no)
                            (substring re start pos)
                          (make-string (- pos start) ?\ ))))
          (setq start pos))
        (insert pr-line)
        (setq line-no (1+ line-no))))))
#########################################################################      

> Thanks.

>   Xah

-- 
Alan Mackenzie (Nuremberg, Germany).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]