emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Raw string literals in Emacs lisp.


From: Stephen J. Turnbull
Subject: Re: Raw string literals in Emacs lisp.
Date: Mon, 28 Jul 2014 11:16:17 +0900

David Caldwell writes:

 > Why not, then, skip rawstrings completely and go directly to a regular
 > expression reader: #r// (or even just #//) instead of #r""?

It's unlispy.  Regular expressions *are* strings and can be
manipulated as strings; (almost) any string can be used as a regular
expression.  Therefore (in Lisp) we normally define separate functions
to deal with "string" use cases and "regexp" uses cases for the same
object.  And they mix and match well:

(defvar xft-xlfd-font-regexp
  (concat
   ;; XLFD specifies ISO 8859-1 encoding, but we can't handle non-ASCII
   ;; in Mule when this function is called.  So use HPC.
   ;; (xe_xlfd_prefix "\\(\\+[\040-\176\240-\377]*\\)?-")
   ;; (xe_xlfd_opt_text "\\([\040-\044\046-\176\240-\377]*\\)")
   ;; (xe_xlfd_text "\\([\040-\044\046-\176\240-\377]+\\)")
   "\\`"
   "\\(\\+[\040-\176]*\\)?-"            ; prefix
   "\\([^-]+\\)"                        ; foundry
   "-"
   "\\([^-]+\\)"                        ; family
   "-"
   "\\([^-]+\\)"                        ; weight
   "-"
   "\\([0-9ior?*][iot]?\\)"             ; slant
   "-"
   "\\([^-]+\\)"                        ; swidth
   "-"
   "\\([^-]*\\)"                        ; adstyle
   "-"
   "\\([0-9?*]+\\|\\[[ 0-9+~.e?*]+\\]\\)"    ; pixelsize
   "-"
   "\\([0-9?*]+\\|\\[[ 0-9+~.e?*]+\\]\\)"    ; pointsize
   "-"
   "\\([0-9?*]+\\)"                     ; resx
   "-"
   "\\([0-9?*]+\\)"                     ; resy
   "-"
   "\\([cmp?*]\\)"                      ; spacing
   "-"
   "~?"                                 ; avgwidth
   "\\([0-9?*]+\\)"
   "-"
   "\\([^-]+\\)"                        ; registry
   "-"
   "\\([^-]+\\)"                        ; encoding
   "\\'")
  "The regular expression used to match XLFD font names.")

Of course that would be more readable with rawstrings (not used
because this code is shared with XEmacs 21.4), and even more readable
with PCRE, but it shows we don't really need /x to build regexps
readably.  If #r"..." generated something other than strings, you'd
have to write code to deal with issues like building regexps using
concat.  I think format would be a huge can of worms.

 > This will be just as easy to implement as raw strings.

No, it won't.  Raw strings are just a different read syntax for
strings, and have exactly the same internal representation.  At
present we don't have a regular expression type (although we do have a
compiled regular expression type internally).  If you're not proposing
to define a regular expression type (good luck getting that past
RMS!), then you're just proposing a rawstring syntax tuned for regexp
use.

But there's no reason that couldn't be used for other purposes.  For
example, some people (Python programmers) would probably appreciate a
#r"..."/x rawstring syntax that automatically dedents -- for use in
docstrings.

 > Languages like Javascript, Perl, Ruby, Bash, and Groovy have shown that
 > having a special support for regexps at a language level is a very
 > effective way of dealing with them.

Lisp is not those languages, and in fact it is very unlike those
languages.

 > Plus it opens the door to extensions: #r//p for PCRE/Perl syntax[1]
 > or #r//x for more readable regexps[2], etc.

(defun emacsify-pcre (s)
  "Convert a PCRE to Emacs notation, properly ;-) ignoring unknown backslash."
  ;; exercise for the reader
  )

or

(require 'pcre)                         ; SXEmacs may have implemented this.
(let ((cre (pcre-compile "...")))
  (while (pcre-search-forward cre)
    (do-something)))

and as shown above /x isn't really necessary.  Like it or not, that's
the way these things are done in the Emacs Lisp world.  If you don't
like it, there are languages like Javascript, Perl, Ruby, Bash, and
Groovy.  (Python is too much like Lisp for you, I suspect. ;-)

 > I think using rawstrings is too generic an answer to the problem.

I think using rawstrings is the only sane answer to the problem.  You
can call them "regular expressions" as suggested by the #r notation
and their most prominent application, but in Emacs Lisp representing
them internally as a type other than string would be way too much work
given the idioms we have for constructing regexps that would need to
be reimplemented.  Given that internally they are (Just String), why
specialize to regular expressions?  Would you error on #r/*.*/, which
is invalid syntax for a regular expression?

 > [1] And practically every other language on the planet. Really, it seems
 > like only Emacs is left in the dark ages of basic POSIX regexps where
 > '(' means literal paren and not matching.

Sure, but that's a different problem easily solved if anyone wants to
do it.  GNU grep shows how: use egrep.  (POSIX grep with its default
to basic REs and an argument -E to indicate modern syntax is a bad
example for Lisp, I think.)  The analog for Emacs is a suite of
"pcre-" functions.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]