emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Raw string literals in Emacs lisp.


From: Thorsten Jolitz
Subject: Re: Raw string literals in Emacs lisp.
Date: Sat, 26 Jul 2014 23:37:42 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Matthew Plant <address@hidden> writes:

> I think that raw string literals would be a really nice thing to add
> to Emacs
> lisp. The most immediate benefit is that writing regexps would be much
> easier.
> And since most of the work that goes into major modes is writing
> regexp, writing
> major modes would become a lot faster.

BTW, I recently wrote a little library called

,----
| drx.el --- declarative dynamic regular expressions
`----

available on github (https://github.com/tj64/drx). 

Its main purpose was enabling one more level of abstraction when writing
(org-mode) regexps, i.e. replace the hardcoded 

,----
| "^" (BOL)
| "$" (EOL)
| "\*" (Org STAR)
`----

in regexps strings like

,----
| "^\\* foo$"
`----

with variables

,----
|  (defvar drx-BOL "^")
|  (defvar drx-EOL "$")
|  (defvar drx-STAR (regexp-quote "*"))
`----

and build regexps with functions calls like
 
,----
| (drx " foo" t t t)
`----

The idea was based on an analysis of what would be needed for a true Org
Minor Mode, i.e. the application of Org's core functionality outside of
the Org major-mode. At the lowest level, the core obstacle is in the
hard-coded regexp snippets spread all over the Org sources that don't
match anymore when the org elements are in comment sections of
programming major-modes.

E.g. this would match 'old-school' headers in emacs-lisp-mode:

#+begin_src emacs-lisp 
  (let ((drx-BOL "^;;")
        (drx-STAR ";"))
    (format "%S" (drx " foo" t t t)))
#+end_src

#+results:
: "^;;; foo$"

and this 'outshine' (outcommented org-mode) headers:

#+begin_src emacs-lisp 
  (let ((drx-BOL "^;; "))
    (format "%S" (drx " foo" t t t)))
#+end_src

#+results:
: "^;; \\* foo$"

and this 'outshine' headers in css-mode:

#+begin_src emacs-lisp 
  (let ((drx-BOL "^/\\* ")
        (drx-EOL "\\*/$"))
    (format "%S" (drx " foo" t t t)))
#+end_src

#+results:
: "^/\\* \\* foo\\*/$"

The idea was rejected by the Org maintainers, but the library does
exist now, and the reason I mention it here is that it makes writing
regexps much faster and easier (with a different approach than rx.el,
the regexps itself are still written as strings, only the plumbing is
done declaratively. 

Here are a few more complex examples from the drx.el test section:

#+begin_src emacs-lisp
(format "%S"
  (let ((drx-BOL "^;;")
        (drx-STAR ";"))
    (drx " foo" t '(2 2) nil)))
#+end_src

#+results:
: "^;;\\(;\\{2\\}\\)\\{2\\} foo"


#+begin_src emacs-lisp
  (format "%S" (drx "foo" t t t t))
#+end_src

#+results:
: "^\\*\\(foo\\)$"

#+begin_src emacs-lisp
  (format "%S" (drx "foo" nil nil nil 'alt "bar"))
#+end_src

#+results:
: "\\(foo\\|bar\\)"


#+begin_src emacs-lisp
  (format "%S" (drx "foo" nil nil nil 'shy "bar"))
#+end_src

#+results:
: "\\(?:foo\\)\\(?:bar\\)"


#+begin_src emacs-lisp
 (format "%S" (drx "foo" t 2 t 'app "\\(bar\\)" "loo"))
#+end_src

#+results:
: "^\\*\\{2\\}\\(foo\\)\\(bar\\)\\(loo\\)$"

#+begin_src emacs-lisp
(format "%S" (drx "foo" t '(t t t) t '(t t t) "bar" "loo"))
#+end_src

#+results:
: "^\\(\\(\\*\\)\\(\\*\\)\\)\\(foo\\(bar\\)\\(loo\\)\\)$"

so even without raw strings, this helps to avoid typing all these
parens and backslashes. By nesting 'drx calls one can create really
complex regexps that contain only a few and simple string literals. 

I don't know (but would be curious to know) how writing regexps this
way would affect a library's execution speed, expecially if the 'drx
calls appear in low level functions that are called all the time. 

PS
For the sake of completeness, here the docstring of `drx':

,----[ C-h f drx RET ]
| drx is a Lisp function in `drx.el'.
| 
| (drx RGXP &optional BOLP STARS EOLP ENCLOSING &rest RGXPS)
| 
| Make regexp combining RGXP and optional RGXPS.
| 
| With BOLP non-nil, add 'drx-BOL' at beginning of regexp, with EOLP
| non-nil add 'drx-EOL' at end of regexp.
| 
| STARS, when non-nil, uses 'drx-STAR' and encloses and repeats it.
| 
| ENCLOSING, when non-nil, takes RGXP and optional RGXPS and combines,
| encloses and repeats them.
| 
| While BOLP and EOLP are switches that don't do nothing when nil and
| insert whatever value 'drx-BOL' and 'drx-EOL' are set to when
| non-nil, both arguments STARS and ENCLOSING take either symbols,
| numbers, strings or (nested) lists as values and act conditional on
| the type.
| 
| All the following 'atomic' argument values are valid for both STARS
| and ENCLOSING but with a slightly different meaning:
| 
| STARS: repeat 'drx-STAR' (without enclosing) conditional on argument
| value
| 
| ENCLOSING: repeat enclosed combination of RGXP and RGXPS conditional
| on argument value
| 
|   - nil :: do nothing (no repeater, no enclosing)
| 
|   - t :: (and any other symbol w/o special meaning) repeat once
| 
|   - n :: (number) repeat n times {n}
| 
|   - "n" :: (number-as-string) repeat n times {n}
| 
|   - "n," :: (string) repeat >= n times {n,}
| 
|   - ",m" :: (string) repeat <= m times {,m}
| 
|   - "n,m" :: (string) repeat n to m times {n,m}
|        
|   - "?" :: (string) repeat with ?
| 
|   - "*" :: (string) repeat with *
| 
|   - "+" :: (string) repeat with +
| 
|   - "??" :: (string) repeat with ??
| 
|   - "*?" :: (string) repeat with *?
| 
|   - "+?" :: (string) repeat with +?
| 
|   - "xyz" :: (any other string) repeat once
| 
| Note that, when used with STARS and ENCLOSING, t almost always
| means 'enclose and repeat once', while 1 and "1" stand for
| 'do not enclose, repeat once' - depending on the context.
| 
| These atomic values can be wrapped in a list and change their
| meaning then. In a list of length 1 they specify 'enclose element
| first, apply repeater then'. In a list of lenght > 1 the specifier
| in the car applies to the combination of all elements, while each of
| the specifiers in the cdr applies to one element only. In the case
| of argument STAR, an element is always 'drx-STAR'. In the case of
| argument ENCLOSING, a non-nil optional argument RGXPS represents the
| list of elements, each of them being a regexp string.
| 
| Here are two calls of 'drx' with interchanged list arguments to
| STARS and ENCLOSING and their return values, demonstrating the
| above:
| 
|   ,------------------------------------------------------------
|   | (drx "foo" t '(nil t (2)) t '(t nil (2))
|   |      "bar" "loo")
|   | "^\(\*\)\(\*\)
| Uses keymap `2\', which is not currently defined.
| \(foobar\(loo\)
| Uses keymap `2\', which is not currently defined.
| \)$"
|   `------------------------------------------------------------
| 
|   ,------------------------------------------------------------
|   | (drx "foo" t '(t nil (2)) t '(nil t (2))
|   |       "bar" "loo")
|   | "^\(\*\(\*\)
| Uses keymap `2\', which is not currently defined.
| \)foo\(bar\)\(loo\)
| Uses keymap `2\', which is not currently defined.
| $"
|   `------------------------------------------------------------

ups, bug in boxquote.el?
should look like this:

  ,------------------------------------------------------------
  | (drx \"foo\" t '(nil t (2)) t '(t nil (2))
  |      \"bar\" \"loo\")
  | \"^\\(\\*\\)\\(\\*\\)\\{2\\}\\(foobar\\(loo\\)\\{2\\}\\)$\"
  `------------------------------------------------------------

  ,------------------------------------------------------------
  | (drx \"foo\" t '(t nil (2)) t '(nil t (2))
  |       \"bar\" \"loo\")
  | \"^\\(\\*\\(\\*\\)\\{2\\}\\)foo\\(bar\\)\\(loo\\)\\{2\\}$\"
  `------------------------------------------------------------

| 
| Many more usage examples with their expected outcome can be found as
| ERT tests in the test-section of drx.el and should be consulted in
| doubt.
| 
| There are a few symbols with special meaning as values of the
| ENCLOSING argument (when used as atomic argument or as car of a list
| argument), namely:
|  
|   - alt :: Concat and enclose RGXP and RGXPS as regexp alternatives.
|            Eventually add drx-BOL/STARS and drx-EOL before
|            first/after last alternative.
| 
|   - grp :: Concat and enclose RGXP and RGXPS. Eventually add
|              drx-BOL, STARS and drx-EOL as first/second/last group.
| 
|   - shy :: Concat and enclose RGXP and RGXPS as shy regexp
|            groups. Eventually add drx-BOL, STARS and drx-EOL as
|            first/second/last group.
| 
|   - app :: like 'grp', but rather append RGXP and RGXPS instead
|               of enclosing them if they are already regexp groups
|               themselves.
| 
| They create regexp groups but don't apply repeaters to them.
| 
| [back]
`----

-- 
cheers,
Thorsten




reply via email to

[Prev in Thread] Current Thread [Next in Thread]