[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Raw string literals in Emacs lisp.
From: |
Thorsten Jolitz |
Subject: |
Re: Raw string literals in Emacs lisp. |
Date: |
Sat, 26 Jul 2014 23:37:42 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) |
Matthew Plant <address@hidden> writes:
> I think that raw string literals would be a really nice thing to add
> to Emacs
> lisp. The most immediate benefit is that writing regexps would be much
> easier.
> And since most of the work that goes into major modes is writing
> regexp, writing
> major modes would become a lot faster.
BTW, I recently wrote a little library called
,----
| drx.el --- declarative dynamic regular expressions
`----
available on github (https://github.com/tj64/drx).
Its main purpose was enabling one more level of abstraction when writing
(org-mode) regexps, i.e. replace the hardcoded
,----
| "^" (BOL)
| "$" (EOL)
| "\*" (Org STAR)
`----
in regexps strings like
,----
| "^\\* foo$"
`----
with variables
,----
| (defvar drx-BOL "^")
| (defvar drx-EOL "$")
| (defvar drx-STAR (regexp-quote "*"))
`----
and build regexps with functions calls like
,----
| (drx " foo" t t t)
`----
The idea was based on an analysis of what would be needed for a true Org
Minor Mode, i.e. the application of Org's core functionality outside of
the Org major-mode. At the lowest level, the core obstacle is in the
hard-coded regexp snippets spread all over the Org sources that don't
match anymore when the org elements are in comment sections of
programming major-modes.
E.g. this would match 'old-school' headers in emacs-lisp-mode:
#+begin_src emacs-lisp
(let ((drx-BOL "^;;")
(drx-STAR ";"))
(format "%S" (drx " foo" t t t)))
#+end_src
#+results:
: "^;;; foo$"
and this 'outshine' (outcommented org-mode) headers:
#+begin_src emacs-lisp
(let ((drx-BOL "^;; "))
(format "%S" (drx " foo" t t t)))
#+end_src
#+results:
: "^;; \\* foo$"
and this 'outshine' headers in css-mode:
#+begin_src emacs-lisp
(let ((drx-BOL "^/\\* ")
(drx-EOL "\\*/$"))
(format "%S" (drx " foo" t t t)))
#+end_src
#+results:
: "^/\\* \\* foo\\*/$"
The idea was rejected by the Org maintainers, but the library does
exist now, and the reason I mention it here is that it makes writing
regexps much faster and easier (with a different approach than rx.el,
the regexps itself are still written as strings, only the plumbing is
done declaratively.
Here are a few more complex examples from the drx.el test section:
#+begin_src emacs-lisp
(format "%S"
(let ((drx-BOL "^;;")
(drx-STAR ";"))
(drx " foo" t '(2 2) nil)))
#+end_src
#+results:
: "^;;\\(;\\{2\\}\\)\\{2\\} foo"
#+begin_src emacs-lisp
(format "%S" (drx "foo" t t t t))
#+end_src
#+results:
: "^\\*\\(foo\\)$"
#+begin_src emacs-lisp
(format "%S" (drx "foo" nil nil nil 'alt "bar"))
#+end_src
#+results:
: "\\(foo\\|bar\\)"
#+begin_src emacs-lisp
(format "%S" (drx "foo" nil nil nil 'shy "bar"))
#+end_src
#+results:
: "\\(?:foo\\)\\(?:bar\\)"
#+begin_src emacs-lisp
(format "%S" (drx "foo" t 2 t 'app "\\(bar\\)" "loo"))
#+end_src
#+results:
: "^\\*\\{2\\}\\(foo\\)\\(bar\\)\\(loo\\)$"
#+begin_src emacs-lisp
(format "%S" (drx "foo" t '(t t t) t '(t t t) "bar" "loo"))
#+end_src
#+results:
: "^\\(\\(\\*\\)\\(\\*\\)\\)\\(foo\\(bar\\)\\(loo\\)\\)$"
so even without raw strings, this helps to avoid typing all these
parens and backslashes. By nesting 'drx calls one can create really
complex regexps that contain only a few and simple string literals.
I don't know (but would be curious to know) how writing regexps this
way would affect a library's execution speed, expecially if the 'drx
calls appear in low level functions that are called all the time.
PS
For the sake of completeness, here the docstring of `drx':
,----[ C-h f drx RET ]
| drx is a Lisp function in `drx.el'.
|
| (drx RGXP &optional BOLP STARS EOLP ENCLOSING &rest RGXPS)
|
| Make regexp combining RGXP and optional RGXPS.
|
| With BOLP non-nil, add 'drx-BOL' at beginning of regexp, with EOLP
| non-nil add 'drx-EOL' at end of regexp.
|
| STARS, when non-nil, uses 'drx-STAR' and encloses and repeats it.
|
| ENCLOSING, when non-nil, takes RGXP and optional RGXPS and combines,
| encloses and repeats them.
|
| While BOLP and EOLP are switches that don't do nothing when nil and
| insert whatever value 'drx-BOL' and 'drx-EOL' are set to when
| non-nil, both arguments STARS and ENCLOSING take either symbols,
| numbers, strings or (nested) lists as values and act conditional on
| the type.
|
| All the following 'atomic' argument values are valid for both STARS
| and ENCLOSING but with a slightly different meaning:
|
| STARS: repeat 'drx-STAR' (without enclosing) conditional on argument
| value
|
| ENCLOSING: repeat enclosed combination of RGXP and RGXPS conditional
| on argument value
|
| - nil :: do nothing (no repeater, no enclosing)
|
| - t :: (and any other symbol w/o special meaning) repeat once
|
| - n :: (number) repeat n times {n}
|
| - "n" :: (number-as-string) repeat n times {n}
|
| - "n," :: (string) repeat >= n times {n,}
|
| - ",m" :: (string) repeat <= m times {,m}
|
| - "n,m" :: (string) repeat n to m times {n,m}
|
| - "?" :: (string) repeat with ?
|
| - "*" :: (string) repeat with *
|
| - "+" :: (string) repeat with +
|
| - "??" :: (string) repeat with ??
|
| - "*?" :: (string) repeat with *?
|
| - "+?" :: (string) repeat with +?
|
| - "xyz" :: (any other string) repeat once
|
| Note that, when used with STARS and ENCLOSING, t almost always
| means 'enclose and repeat once', while 1 and "1" stand for
| 'do not enclose, repeat once' - depending on the context.
|
| These atomic values can be wrapped in a list and change their
| meaning then. In a list of length 1 they specify 'enclose element
| first, apply repeater then'. In a list of lenght > 1 the specifier
| in the car applies to the combination of all elements, while each of
| the specifiers in the cdr applies to one element only. In the case
| of argument STAR, an element is always 'drx-STAR'. In the case of
| argument ENCLOSING, a non-nil optional argument RGXPS represents the
| list of elements, each of them being a regexp string.
|
| Here are two calls of 'drx' with interchanged list arguments to
| STARS and ENCLOSING and their return values, demonstrating the
| above:
|
| ,------------------------------------------------------------
| | (drx "foo" t '(nil t (2)) t '(t nil (2))
| | "bar" "loo")
| | "^\(\*\)\(\*\)
| Uses keymap `2\', which is not currently defined.
| \(foobar\(loo\)
| Uses keymap `2\', which is not currently defined.
| \)$"
| `------------------------------------------------------------
|
| ,------------------------------------------------------------
| | (drx "foo" t '(t nil (2)) t '(nil t (2))
| | "bar" "loo")
| | "^\(\*\(\*\)
| Uses keymap `2\', which is not currently defined.
| \)foo\(bar\)\(loo\)
| Uses keymap `2\', which is not currently defined.
| $"
| `------------------------------------------------------------
ups, bug in boxquote.el?
should look like this:
,------------------------------------------------------------
| (drx \"foo\" t '(nil t (2)) t '(t nil (2))
| \"bar\" \"loo\")
| \"^\\(\\*\\)\\(\\*\\)\\{2\\}\\(foobar\\(loo\\)\\{2\\}\\)$\"
`------------------------------------------------------------
,------------------------------------------------------------
| (drx \"foo\" t '(t nil (2)) t '(nil t (2))
| \"bar\" \"loo\")
| \"^\\(\\*\\(\\*\\)\\{2\\}\\)foo\\(bar\\)\\(loo\\)\\{2\\}$\"
`------------------------------------------------------------
|
| Many more usage examples with their expected outcome can be found as
| ERT tests in the test-section of drx.el and should be consulted in
| doubt.
|
| There are a few symbols with special meaning as values of the
| ENCLOSING argument (when used as atomic argument or as car of a list
| argument), namely:
|
| - alt :: Concat and enclose RGXP and RGXPS as regexp alternatives.
| Eventually add drx-BOL/STARS and drx-EOL before
| first/after last alternative.
|
| - grp :: Concat and enclose RGXP and RGXPS. Eventually add
| drx-BOL, STARS and drx-EOL as first/second/last group.
|
| - shy :: Concat and enclose RGXP and RGXPS as shy regexp
| groups. Eventually add drx-BOL, STARS and drx-EOL as
| first/second/last group.
|
| - app :: like 'grp', but rather append RGXP and RGXPS instead
| of enclosing them if they are already regexp groups
| themselves.
|
| They create regexp groups but don't apply repeaters to them.
|
| [back]
`----
--
cheers,
Thorsten
- Re: Raw string literals in Emacs lisp., (continued)
- Re: Raw string literals in Emacs lisp., Ted Zlatanov, 2014/07/30
- Re: Raw string literals in Emacs lisp., David Caldwell, 2014/07/30
- Re: Raw string literals in Emacs lisp., Ted Zlatanov, 2014/07/30
- Re: Raw string literals in Emacs lisp., Matthew Plant, 2014/07/30
- Re: Raw string literals in Emacs lisp., Ted Zlatanov, 2014/07/30
- Re: Raw string literals in Emacs lisp., Matthew Plant, 2014/07/30
- Re: Raw string literals in Emacs lisp., Ted Zlatanov, 2014/07/31
- Re: Raw string literals in Emacs lisp., Stephen J. Turnbull, 2014/07/27
Re: Raw string literals in Emacs lisp.,
Thorsten Jolitz <=
Re: Raw string literals in Emacs lisp., William Xu, 2014/07/29