[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: why emacs lisp's regex has 2-steps escapes?
From: |
Alan Mackenzie |
Subject: |
Re: why emacs lisp's regex has 2-steps escapes? |
Date: |
Thu, 10 Jul 2008 09:39:33 +0000 |
User-agent: |
Mutt/1.5.9i |
On Wed, Jul 09, 2008 at 03:30:27AM -0700, Xah wrote:
> emacs regex has a odd pecularity in that it needs a lot backslashes.
> More specifically, a string first needs to be properly escaped, then
> this passed to the regex engine.
Yes. The greatest number of consecutive backslashes I've seen (in a
non-joke context) is 10.
> For example, suppose you have this text ???Sin[x] + Sin[y]??? and you need
> to capture the x or y.
Ironically, Xah, you are doing the same sort of thing in your post,
using crazy quote characters (if that is indeed what they are), 0x5397c
and 0x5397d (according to C-u C-x =). Over my SSH link to my SSP, your
quotes look something like "â~@~]", and are most difficult to read
without a pair of sunspecs which filters out the UTF.
Could you, perhaps, use the standard ASCII quotes 0x22 and 0x27 here,
please?
> In emacs i need to use
> ???\\(\\[[a-z]\\]\\)???
> for the actual regex
> ???\(\[[a-z]\]\)???.
> Here's somewhat typical but long regex for matching a html image tag
> (search-forward-regexp "<img +src=\"\\([^\"]+\\)\" +alt=\"\\([^\"]+\\)?
> \" +width=\"\\([0-9]+\\)\" +height=\"\\([0-9]+\\)\" ?>" nil t)
> The toothpick syndrom gets crazy making already difficult regex syntax
> impossible to read and hard to code.
> My question is, why is elisp's regex has this 2-steps process? Is this
> some design decision or just happened that way historically?
> Second question: can't elisp create some like ???regex-string??? wrapper
> function that automatically takes care of the quoting? I can't see how
> this might be difficult?
Well, I've hacked up a function to display regexps in *scratch*,
concentrating in particular on deeply nested \( .... \| .... \)
constructs. It doesn't work so well when the regexp's length exceeds
the window width, but it could be enhanced:
#########################################################################
(defun translate-rnt (regexp)
"REGEXP is a string. Translate any \t \n \r and \f characters
to wierd non-ASCII printable characters: \t to Î (206, \xCE), \n
to ñ (241, \xF1), \r to ® (174, \xAE) and \f to £ (163, \xA3).
The original string is modified."
(let (pos)
(while (setq pos (string-match "[\t\n\r\f]" regexp))
(setq ch (aref regexp pos))
(aset regexp pos
(cond ((eq ch ?\t) ?Î)
((eq ch ?\n) ?ñ)
((eq ch ?\r) ?®)
(t ?£))))
regexp))
(defun pp-regexp (regexp)
"Pretty print a regexp. This means, contents of \\\\\(s are lowered a line."
(or (stringp regexp) (error "parameter is not a string."))
(let ((depth 0)
(re (copy-sequence regexp))
(start 0) ; earliest position still without an acm-depth property.
(pos 0) ; current analysis position.
(max-depth 0) ; How many lines do we need to print?
(min-depth 0) ; Pick up "negative depth" errors.
pr-line ; output line being constructed
line-no ; line number of pr-line, varies between min-depth and
max-depth.
)
(translate-rnt re)
;; apply acm-depth properties to the whole string.
(while (< start (length re))
(setq pos (string-match "\\\\\\((\\(\\?:\\)?\\||\\|)\\)"
re start))
(put-text-property start (or pos (length re)) 'acm-depth depth re)
(when pos
(setq ch (aref (match-string 1 re) 0))
(cond
((eq ch ?\()
(put-text-property pos (match-end 1) 'acm-depth depth re)
(setq depth (1+ depth))
(if (> depth max-depth) (setq max-depth depth)))
((eq ch ?\|)
(put-text-property pos (match-end 1) 'acm-depth (1- depth) re)
(if (< (1- depth) min-depth) (setq min-depth (1- depth))))
(t ; (eq ch ?\))
(setq depth (1- depth))
(if (< depth min-depth) (setq min-depth depth))
(put-text-property pos (match-end 1) 'acm-depth depth re))))
(setq start (if pos (match-end 1) (length re))))
;; print out the strings
(setq line-no min-depth)
(while (<= line-no max-depth)
(with-current-buffer "*scratch*"
(goto-char (point-max)) (insert ?\n)
(setq pr-line "")
(setq start 0)
(while (< start (length re))
(setq pos (next-single-property-change start 'acm-depth re (length
re)))
(setq depth (get-text-property start 'acm-depth re))
(setq pr-line
(concat pr-line
(if (= depth line-no)
(substring re start pos)
(make-string (- pos start) ?\ ))))
(setq start pos))
(insert pr-line)
(setq line-no (1+ line-no))))))
#########################################################################
> Thanks.
> Xah
--
Alan Mackenzie (Nuremberg, Germany).