emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Electric punctuation mode


From: Michal Nazarewicz
Subject: Electric punctuation mode
Date: Tue, 02 Feb 2016 17:37:44 +0100
User-agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/25.1.50.1 (x86_64-unknown-linux-gnu)

Hi guys,

For quite a while now I’ve been testing ‘electric-punc-mode’ attached
below.

The idea behind it is to write Unicode characters using nothing but
ASCII.  For example, as one types ‘...’, it gets replaced with an
ellipsis, ‘<<’ is replaced by ‘«’, ‘<=’ becomes ‘≤’ and so forth.

It’s actually a superset of electric-quote-mode so perhaps it would make
sense to have it merged and implement electric-quote-mode as special
case of electric-punct-mode?

-- 
Best regards
Liege of Serenely Enlightened Majesty of Computer Science,
ミハウ “mina86” ナザレヴイツ  <address@hidden> <xmpp:address@hidden>

---- >8 ----------------------------------------------------------------
;;; electric-punct.el --- Makes entering Unicode punctuation characters easier. 
-*- lexical-binding: t; coding: utf-8 -*-

;; Copyright 2014 Google Inc.

;; Author: Michal Nazarewicz <address@hidden>
;; Maintainer: Michal Nazarewicz <address@hidden>
;; Version: 0.3
;; Keywords: text, unicode, punctuation, convenience

;; This file is not part of GNU Emacs.

;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.

;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with this program.  If not, see <http://www.gnu.org/licenses/>.

;;; Commentary:

;; INTRODUCTION
;;
;;     Helps insert Unicode punctuation and space characters by just typing
;;     ASCII characters.  This is done by inspecting context of inserted
;;     character and replacing ordinary sequences of ASCII characters by
;;     Unicode characters.
;;
;;     For example – if one types minus sign twice, it is converted to an
;;     en-dash (just like I've done after “For example” at the beginning of
;;     the sentence), and pressing minus sign again converts the character to
;;     an em-dash, and one more press on minus further converts the dash to
;;     a horizontal bar.  Space characters around the dash is also affected to
;;     be typographically more correct.

;; DEFAULT RULES
;;
;;     The set of expansion rules defined in the default set is as follows:
;;     * Space typed after a Unicode space converts it to a regular space.
;;     * Space typed after minus sign, en-dash, em-dash or horizontal bar
;;       inserts space character preceding the dash character or an
;;       appropriate for given dash (customisable).
;;     * A minus sign typed after a minus sign, en-dash or em-dash convert the
;;       dash before point into en-dash, em-dash or horizontal bar
;;       respectively and if the dash is proceeded by a space character
;;       converts it to space appropriate for given dash.
;;     * A “<digit><ascii-space><digit>” sequence is converted to
;;       “<digit><thin-space><digit>” which works better for thousands
;;       separator <thin-space> is customisable).
;;     * A thin space is inserted between two consecutive quotes characters,
;;       for example in “never say ‘never’ ” a thin space is added between
;;       closing quotes; This works even if the second closing quote is
;;       entered using two apostrophes (see below).
;;     * The following sequences are converted into Unicode characters:
;;        Sequence  Becames
;;          ...      …    ellipsis
;;          ,,       „    double low-9 quotation mark
;;          ''       ”    right double quotation mark
;;          ``       “    left double quotation mark
;;          <<       «    left-pointing double angle quotation mark
;;          <<<      ≪    much less-than
;;          >>       »    right-pointing double angle quotation mark
;;          >>>      ≫    much greater-than
;;          >=       ≥    greater-than or equal to
;;          <=       ≤    less-than or equal to
;;          <-       ←    leftwards arrow
;;          ->       →    rightwards arrow
;;          !=       ≠    not equal to
;;          /=       ≠    not equal to
;;          !~       ≁    not tilde
;;          /~       ≁    not tilde
;;          !~=      ≄    not asymptotically equal to
;;          /~=      ≄    not asymptotically equal to
;;          !==      ≇    neither approximately nor actually equal to
;;          /==      ≇    neither approximately nor actually equal to
;;          ~=       ≈    almost equal to
;;          <==      ⇐    leftwards double arrow
;;          =>       ⇒    rightwards double arrow
;;          <=>      ⇔    left right double arrow
;;     * Furthermore in major modes for editing HTML, XML and SGML files, if
;;       space is typed after an entity referencing a Unicode space
;;       (e.g. “&nbsp;” or “&#x2009;” converts the entity into a simple ASCII
;;       space.
;;     * And also in such modes, the following rules are applied (some of them
;;       overwrite the more general rules above):
;;        Sequence  Becames
;;          <<      &lt;
;;          >>      &gt;
;;          &&      &amp;
;;          ""      &quot;

;; USAGE
;;
;;     The simplest way to enable `electric-punct-mode' is to call:
;;
;;         M-x electric-punct-auto-mode RET
;;
;;     It will determine whether current major mode is programming mode and if
;;     so enable `electric-punct-mode' only for comments and strings.  If
;;     current major mode is not a programming mode, `electric-punct-mode'
;;     will be enabled throughout the buffer.
;;
;;     The mode can also be enabled globally by:
;;
;;         M-x global-electric-punct-mode RET
;;
;;     To keep the mode enabled across Emacs restarts, add the following to
;;     the init file:
;;
;;        (global-electric-punct-mode 1)

;; CUSTOMISATION
;;
;;     The mode is customised via `electric-punct' customise group which
;;     contains the following variables:
;;     * electric-punct-treat-dash-spacing – enables spacing manipulation
;;       around dashes; if non-nil, the following variables take effect:
;;       - electric-punct-en-dash-space – space to use around an en-dash
;;       - electric-punct-em-dash-space – space to use around an em-dash
;;       - electric-punct-horizontal-bar-space – space to use around
;;         a horizontal bar.
;;     * electric-punct-thousands-separator – space to use as thousands
;;       separator; setting to nil disables the feature.
;;     * electric-punct-quotes-space – space to use between consequtive quote
;;       characters; setting to nil disables the feature.
;;     * electric-punct-handle-backspace – enables special handler for
;;       backspace key.
;;     * electric-punct-auto-list – allows automatic setting of a predicate
;;       which can disable electric behaviour in parts of the buffer.
;;
;;     In addition the `electric-punct-rules' variable lists all the rules an
;;     can be modified as one sees fit.
;;
;;     Some other variables that may be useful to customise:
;;     * electric-punct-html-modes – a list of modes in which HTML features
;;       are enabled.
;;     * electric-punct-blacklisted-modes – a list of modes for which
;;       `electric-punct-auto-mode' function (and thus
;;       `global-electric-punct-mode') will not enable `electric-punct-mode'.
;;     * electric-punct-prog-modes – a list of modes in addition to any modes
;;       derived from `prog-mode' for which `electric-punct-auto-mode' will
;;       enable `electric-punct-prog-mode'.
;;     * electric-punct-predicate – a function that must return non-nil for
;;       expansions to happen..

;;; Code:

;;;;;;;;;;;;;;;;;;;; Customise ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defgroup electric-punct ()
  "Electric behaviour helping enter Unicode punctuation characters."
  :prefix "electric-punct-"
  :group 'electricity)

(defcustom electric-punct-treat-dash-spacing t
  "Whether to convert spaces into special Unicode spaces in various situations.
At the moment those situations include spaces around dashes and between
sequences of digits being part of the same number (i.e. space used as
a thousands separator)."
  :type 'boolean
  :group 'electric-punct)

(let* ((space-choices '((const :tag "Space" ?\s)
                        (const :tag "No-break space" ?\u00A0)
                        (const :tag "Narrow no-break space" ?\u202F)
                        (const :tag "En quad" ?\u2000)
                        (const :tag "Em quad" ?\u2001)
                        (const :tag "En space" ?\u2002)
                        (const :tag "Em space" ?\u2003)
                        (const :tag "Three-per-em space" ?\u2004)
                        (const :tag "Four-per-em space" ?\u2005)
                        (const :tag "Six-per-em space" ?\u2006)
                        (const :tag "Figure space" ?\u2007)
                        (const :tag "Punctuation space" ?\u2008)
                        (const :tag "Thin space" ?\u2009)
                        (const :tag "Hair space" ?\u200A)
                        (const :tag "Medium mathematical space" ?\u205F)
                        (const :tag "Ideographic space" ?\u3000)
                        (character :tag "Other")))
       (space-type (cons 'choice space-choices))
       (space-or-nil-type
        `(choice (const :tag "Disabled" ()) . ,space-choices)))

  (defcustom electric-punct-en-dash-space ?\u00A0
    "A space character used around an en dash.  A no-break space by default.

With default rules, if space is pressed while point is just after an en-dash,
character specified by this variable will be inserted instead of a normal
ASCII space.  Also, when a minus sign is converted into an en-dash (when “-”
is pressed) and is proceeded by a single ASCII space character, that space
character is converted into character specified by this variable.

Ignored if `electric-punct-treat-dash-spacing' is nil."
    :type space-type
    :group 'electric-punct)

  (defcustom electric-punct-em-dash-space ?\u2009
    "A space character used around an em dash.  A thin space by default.

With default rules, if space is pressed while point is just after an em-dash,
character specified by this variable will be inserted instead of a normal
ASCII space.  Also, when an en-dash is converted into an em-dash (when “-”
is pressed) and is proceeded by a single space character, that space character
is converted into character specified by this variable.

Note that `electric-punct-mode' will not add a space before an em-dash if it
was not already proceeded by a space character.  This gives a choice of
whether em-dash should or should not be surrounded by spaces.

Ignored if `electric-punct-treat-dash-spacing' is nil."
    :type space-type
    :group 'electric-punct)

  (defcustom electric-punct-horizontal-bar-space ?\u2009
    "A space character used around horizontal bar (a.k.a. quotation dash).
A thin space by default.

With default rules, if space is pressed while point is just after a horizontal
bar, character specified by this variable will be inserted instead of a normal
ASCII space.  Also, when an em-dash is converted into a horizontal bar (when
“-” is pressed) and is proceeded by a single space character, that space
character is converted into character specified by this variable.

Note that `electric-punct-mode' will not add a space before a horizontal bar
if it was not already proceeded by a space character.  This gives a choice of
whether horizontal bar should or should not be surrounded by spaces.

Ignored if `electric-punct-treat-dash-spacing' is nil."
    :type space-type
    :group 'electric-punct)

  (defcustom electric-punct-thousands-separator ?\u2009
    "A space used as thousands separator, or nil to disable the feature.
A thin space by default.

With default rules, if this is non-nil, if a digit is typed and point is just
after a “<digit><ascii-space>” sequence, the ASCII space is converted into
a character specified by this variable."
    :type space-or-nil-type
    :group 'electric-punct)

  (defcustom electric-punct-quotes-space ?\u2009
    "A space used between consecutive quote characters, or nil.
A thin space by default.

With default rules, if this is non-nil, if a quote character in inserted after
another quote characters (e.g. as in “never say ‘never’”) character specified
by this variable is inserted between the quote characters."
    :type space-or-nil-type
    :group 'electric-punct))

(defcustom electric-punct-handle-backspace t
  "Whether to install backspace character handlers.

If non-nil, electric-punct will provide a custom binding for backspace key
which attempts to undo the effects of the expansions.  For instance, if
backspace is pressed after an en-dash, instead of deleting that en-dash, it
will be converted to a minus sign."
  :type 'boolean
  :group 'electric-punct)


;;;;;;;;;;;;;;;;;;;; Helper functions ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defun electric-punct--is-Zs (character)
  "Check whether CHARACTER is in Zs Unicode general category."
  (and character
       (eq (get-char-code-property character 'general-category) 'Zs)))

(defun electric-punct--set-match-data (start)
  "Set `match-data' to region from START to `point' in `current-buffer'."
  (set-match-data (list start (point) (current-buffer))) t)

(defconst electric-punct--quote-chars
  '(?\u00AB  ;; left-pointing double angle quotation mark
    ?\u00BB  ;; right-pointing double angle quotation mark
    ?\u2018  ;; left single quotation mark
    ?\u2019  ;; right single quotation mark
    ?\u201A  ;; single low-9 quotation mark
    ?\u201B  ;; single high-reversed-9 quotation mark
    ?\u201C  ;; left double quotation mark
    ?\u201D  ;; right double quotation mark
    ?\u201E  ;; double low-9 quotation mark
    ?\u201F  ;; double high-reversed-9 quotation mark
    ?\u2039  ;; single left-pointing angle quotation mark
    ?\u203A) ;; single right-pointing angle quotation mark
  "A list of characters which act as a opening or closing quotation marks.
Does not contain quotation mark or apostrophe.")

;;;;;;;;;;;;;;;;;;;; Matchers ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defvar electric-punct-html-modes
  '(html-mode xml-mode nxml-mode sgml-mode)
  "A list of major modes for HTML, XML or SGML formats.")

(defun electric-punct-is-html ()
  "Check whether major mode is HTML, XML or SGML mode.
The check is done by checking whether `major-mode' is any of the
modes in `electric-punct-html-modes' list."
  (member major-mode electric-punct-html-modes))
(put 'electric-punct-is-html 'electric-punct-bs-safe t)

(defun electric-punct-looking-back-at-html-space ()
  "Check whether there's a HTML space entity before point.

Returns nil if `electric-punct-is-html' returns nil, otherwise checks whether
point is just after an HTML entity referencing a space character such as
“&nbsp;” or numeric reference of a character with Unicode Zs general category.

Function modifies `match-data'."
  (and (electric-punct-is-html)
       (let (case-fold-search)
         (looking-back
          "&\\(?:nbsp\\|#\\(?:[xX]\\([0-9a-fA-F]+\\)\\|\\([0-9]+\\)\\)\\);"
          (max (point-min) (- (point) 10))))
       (let ((hex (match-string 1)) (dec (match-string 2)))
         (or (not (or hex dec))
             (electric-punct--is-Zs
              (string-to-number (or hex dec) (if hex 16 10)))))))

(defconst electric-punct--space-for-dash-plist
  '(?-      ?\s
    ?\u2013 electric-punct-en-dash-space
    ?\u2014 electric-punct-em-dash-space
    ?\u2015 electric-punct-horizontal-bar-space)
  "A plist of spacing appropriate for given dash.

Maps a dash character (minus sign, en-dash, em-dash and horizontal bar) to
a space character or a variable containing the appropriate space character to
use around that dash character.

Used by `electric-punct--next-dash' and `electric-punct-treat-dash-space' to
figure out spacing that should be used around a dash.")

(defun electric-punct--next-dash (progression skip-one)
  "Convert dash before point into one specified by PROGRESSION list.

PROGRESSION is a list of characters specifying characters that come one after
the other.  For expanding dashes it would be for example (?- en-dash em-dash)
whereas for contracting dashes it would be for example (em-dash en-dash ?-).

If character before point is not an PROGRESSION list or it's its last element,
return nil and do nothing.

Otherwise replace that dash with the next one from PROGRESSION list, and if it
is proceeded by exactly one space character replace that by space specified by
`electric-punct-en-dash-space', `electric-punct-em-dash-space' or
`electric-punct-horizontal-bar-space' or an ASCII space for a minus sign.
Space is not affected if `electric-punct-treat-dash-spacing' is nil.

If SKIP-ONE is non-nil, function operates on character two-characters before
point, i.e. character just before point is skipped, and if substitution takes
place that skipped character is removed."
  (let ((chr (cadr (member (char-before (- (point) (if skip-one 1 0)))
                           progression))))
    (when chr
      (delete-char (if skip-one -2 -1))
      (when (and electric-punct-treat-dash-spacing
                 (electric-punct--is-Zs (preceding-char))
                 (not (electric-punct--is-Zs (char-before (1- (point))))))
        (delete-char -1)
        (let ((space (plist-get electric-punct--space-for-dash-plist chr)))
          (insert (if (symbolp space) (eval space) space))))
      (insert chr)
      t)))

(defun electric-punct-treat-dash ()
  "Expand dash character two characters before point.
See `electric-punct--next-dash' for more detailed description."
  (electric-punct--next-dash '(?- ?\u2013 ?\u2014 ?\u2015) t))

(defun electric-punct-untreat-dash ()
  "Contract dash before point.
See `electric-punct--next-dash' for more detailed description."
  (electric-punct--next-dash '(?\u2015 ?\u2014 ?\u2013 ?-) nil))

(defun electric-punct-treat-dash-space ()
  "Convert space character into one correct for a dash proceeding the space.

Do nothing if character two characters before point is not an en-dash, em-dash
or horizontal bar.  If the dash character is proceeded by a space character
\(i.e. one in Zs Unicode general category), the character after the dash is
converted into that space character.  Otherwise the character after the dash
is converted to space specified by by `electric-punct-en-dash-space',
`electric-punct-em-dash-space' or `electric-punct-horizontal-bar-space'
variable depending on the dash character before point."
  (let ((space (plist-get electric-punct--space-for-dash-plist
                          (char-before (1- (point))))))
    (when space
      (delete-char -1)
      (insert
       (if (electric-punct--is-Zs (preceding-char))
           (buffer-substring (- (point) 2) (1- (point)))
         (eval space)))
      t))) ;; --

(defun electric-punct-treat-thousands-separator ()
  "Convert space between digits to proper thousands separator.
If point is before a “<digit><ascii-space><digit>” sequence, and
`electric-punct-thousands-separator' is non-nil, convert the ASCII space
into character specified by `electric-punct-thousands-separator'
variable.  Return t if that happens, nil otherwise.

\(Actually, the function does not verify whether the ASCII space is really an
ASCII space and whether the character before point is a digit)."
  (let ((space (with-no-warnings electric-punct-thousands-separator)))
    (when (and space (<= ?0 (char-before (- (point) 2)) ?9))
      (goto-char (1- (point)))
      (delete-char -1)
      (insert space)
      (goto-char (1+ (point))))))

(defun electric-punct-untreat-thousands-separator ()
  "Convert \"<digit><thin-space><?>\" sequence to \"<digit><ascii-space>\".
The <thin-space> must actually be `electric-punct-thousands-separator', and
<?> denotes any character.  Return whether function did the conversion.  Do
nothing if `electric-punct-quotes-space' is nil."
  (let ((space (with-no-warnings electric-punct-thousands-separator)))
    (and space
         (>= (- (point) 3) (point-min))
         (eq space (char-before (1- (point))))
         (<= ?0 (char-before (- (point) 2)) ?9)
         (progn
           (delete-char -2)
           (insert " ")
           t))))

(defun electric-punct-treat-quotes-space ()
  "Convert space between two quote characters to proper narrow space.
If `electric-punct-quotes-space' is non-nil, insert it before the character
before point.  Return t if that happens, nil otherwise."
  (let ((space (with-no-warnings electric-punct-quotes-space)))
    (when space
      (goto-char (1- (point)))
      (insert space)
      (goto-char (1+ (point))))))

(defun electric-punct-is-quote-space-quote ()
  "Check whether point is before \"<quote><space><?>\" sequence.
<quote> must be value in `electric-punct--quote-chars' and <space> must be
value of `electric-punct-quotes-space'.  <?> is any character.  Do nothing if
`electric-punct-quotes-space' is nil."
  (let ((space (with-no-warnings electric-punct-quotes-space)))
    (and space
         (>= (- (point) 2) (point-min))
         (eq space (char-before (1- (point))))
         (member (char-before (- (point) 2)) electric-punct--quote-chars))))


;;;;;;;;;;;;;;;;;;;; Rules ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defvar electric-punct-rules
  (eval-when-compile
    `((sp-nop  "  ")
      (sp-html "; "      electric-punct-looking-back-at-html-space " ")
      (dash-sp "- "      electric-punct-treat-dash-space)
      (dash-sp "\u2013 " electric-punct-treat-dash-space)
      (dash-sp "\u2014 " electric-punct-treat-dash-space)
      (dash-sp "\u2015 " electric-punct-treat-dash-space)

      (tilde   "~~" "\u00A0" :no-bs)

      (dash-nop "---")
      (dash-nop "<!--"    electric-punct-is-html)

      (dash     "--"      electric-punct-treat-dash)
      (dash     "\u2013-" electric-punct-treat-dash)
      (dash     "\u2014-" electric-punct-treat-dash)

      (dash :bs "\u2013"  electric-punct-untreat-dash)
      (dash :bs "\u2014"  electric-punct-untreat-dash)
      (dash :bs "\u2015"  electric-punct-untreat-dash)

      (ellipsis "..." "…")

      (quote ",," "„" :recursive)
      (quote "''" "”" :recursive)
      (quote "``" "“" :recursive)

      (html "<<"   electric-punct-is-html "&lt;")
      (html ">>"   electric-punct-is-html "&gt;")
      (html "&&"   electric-punct-is-html "&amp;")
      (html "\"\"" electric-punct-is-html "&quot;")

      (lt-quote "<<" "«")
      (lt-math  "«<" "≪")
      (gt-quote ">>" "»")
      (gt-math  "»>" "≫")

      (math ">=" "≥")
      (math "<=" "≤")
      (math "<-" "←")
      (math "←>" "↔")
      (math "->" "→")
      (math "!=" "≠")
      (math "/=" "≠")
      (math "!~" "≁")
      (math "/~" "≁")
      (math "≁=" "≄")
      (math "≄=" "≇")
      (math "~=" "≈")
      (math "≤=" "⇐")  ; ⇐ requires three strokes: <==
      (math "=>" "⇒")  ; even though ⇒ uses two: =>
      (math "≤>" "⇔")
      .
      ,(let (rules)
         ;; Add “space between two quotations mark” rules:
         ;;     ('quote-quote "<pre-quote><post-quote>"
         ;;                   'electric-punct-treat-quotes-space)
         ;; and a backspace rules eating that space:
         ;;     ('sp-quote-bs :bs "<quote>"
         ;;                       'electric-punct-is-quote-space-quote 2)
         (dolist (pre-q electric-punct--quote-chars)
           (dolist (post-q electric-punct--quote-chars)
             (push (list 'quote-quote (string pre-q post-q)
                         'electric-punct-treat-quotes-space)
                   rules))
           (push (list 'sp-quote-bs (string pre-q)
                       'electric-punct-is-quote-space-quote 2)
                 rules))
         ;; Add “thousands separator” rules:
         ;;     ('thousand " <digit>" 'electric-punct-treat-thousands-separator)
         ;;     ('thousand :bs "<digit>"
         ;;                'electric-punct-untreat-thousands-separator)
         (let ((digit ?0))
           (while (<= digit ?9)
             (push (list 'thousand (string ?\s digit)
                         'electric-punct-treat-thousands-separator) rules)
             (push (list 'thousand :bs (string digit)
                         'electric-punct-untreat-thousands-separator) rules)
             (setq digit (1+ digit))))
         ;; Add “fancy-space space” → space rules: ("<space> " " " :no-bs)
         (map-char-table
          (lambda (key value)
            (when (and (eq value 'Zs) key (not (eq key ?\s)))
              (let ((ch  (if (characterp key) key (car key)))
                    (end (min (if (characterp key) key (cdr key)) #xF0000)))
                (while (<= ch end)
                  (push (list (string ch ?\s) " " :no-bs) rules)
                  (setq ch (1+ ch))))))
          unicode-category-table)
         rules)))
  "A list of punctuation marks expansion rules.

Updating this variable does not immediately take effect.  For the changes to
take effect `electric-punct-reload-rules' must be called, or use
`electric-punct-set-rules' function to set this variable.

Each element of a list is a singe rule with the following format:
    ([:id ID] [:bs] MATCH [BS-SAFETY] [:recursive] . ACTIONS)

ID is optional identifier of the rule which is completely ignored at the
moment.

:bs, if present, specifies that the rule is a backspace rule.  Backspaces
rules are invoked by `electric-punct-delete-backward-char' instead of the
normal electric expansion after character is inserted.

MATCH is a string that must much before point for the rule to work.  This is
not a regexp but a literal string.  For example \"...\" means that the rule is
invoked if point is just after three dots and \">=\" means that the rule is
invoked when point is just after a “>=” sequence.  When rule is evaluated,
`match-group' is modified to point to that prefix in the buffer.

\(Optimisation hint: It's best if MATCH is at least two-character long.  This
 is because internally rules are store in a hash table with the last two
 characters of MATCH being the key.  This helps jump directly to rules that
 have a chance of matching and skip many other rules.

 For example, if there is a rule for \"-1\" and \"+1\", when point is after
 “+1” sequence, the first rule is not even considered.  On the other hand rule
 with MATCH being \" \" will have to be considered each time space character
 is inserted.)

ACTIONS are actions to take when evaluating the rule.  Each action is:
* a string which causes the matched group (see `match-group') to be replaced
  with specified string,
* an integer which causes given number of characters to be deleted backward, or
* a function which is called and must return non-nil for the rule to continue
  being evaluated and be said to “match”.

BS-SAFETY is an optional element specifying whether to generate backspace rule
from given rule (see further down for description).  If BS-SAFETY is
:no-bs backspace rule will not be generated, and if it is :auto-bs
bs-safety of MATCHERS will not be verified and backspace rule will be created.

:recursive keyword if specified causes the rule evaluator to be called again.

\(By the way, :no-bs, :auto-bs and :recursive keywords can be intermixed
 withing ACTIONS.)

Some backspace rules are generated automatically from other rules but several
requirements must be met for that to happen:
* MATCH musts be at least two character long,
* the last action must be a string,
* BS-SAFETY must not be :no-bs,
* BS-SAFETY must be :auto-bs or all ACTIONS (except for last one) must be
  bs-safe.  Action is bs-safe if it is a symbol (as opposed to lambdas) with
  an 'electric-punct-bs-safe property set to non-nil value.

If all those conditions are met, an additional backspace rule is
generated in the following form:
    ((concat LAST-ACTION \"\\b\") ACTIONS-BUT-LAST (substring MATCH 0 -1))

For example,
    (\"<<\" electric-punct-is-html \"&lt;\")
results in the following automatically generated backspace rule:
    (:bs \"&lt;\" electric-punct-is-html \"<\")
\(`electric-punct-is-html' has non-nil 'electric-punct-bs-safe property).")


(defun electric-punct--make-match-action (string)
  "Return a function that matches STRING before point.
Or nil if length of the STRING is no more then two.

Returned function updates `match-data' to point to the STRING.  It may give
false positives if either of the last two characters do not match."
  (cond
   ((= (length string) 3)
    (let ((string (elt string 0)))
      (lambda ()
        (when (eq (char-before (- (point) 2)) string)
          (electric-punct--set-match-data (- (point) 3))))))
   ((> (length string) 3)
    (lambda ()
      (let* ((len (length string)) (start (- (point) len)))
        (when (string-equal string
                            (buffer-substring-no-properties start (point)))
          (electric-punct--set-match-data start)))))))

(defun electric-punct--make-string-action (string)
  "Return function that replaces match data with STRING and returns t."
  (lambda () (replace-match string t t) t))

(defun electric-punct--make-integer-action (n)
  "Return function that deletes N characters backward and returns t."
  (lambda () (delete-char (- n)) t))

(defvar electric-punct--rules-table (make-hash-table :test 'equal)
  "An internal representation of expansion rules.

Keys of the hash table are (BS CHAR-2 . CHAR-1) objects where BS is 0 if this
is a regular rule or 1 if it is a backspace rule,, CHAR-2 and CHAR-1 are two
characters preceding point.  CHAR-2 may be 0 if it is not known in advance
from the rule.  BS is 0/1 as opposed to nil/t because integers hash faster
than nil.

Each value of the hash table is a list of rules of the following form:
  (ID LEN RECURSIVE . ACTIONS)
where ID is an identifier of the rule (currently ignored), LEN is number of
characters needed before (point-min), RECURSIVE specifies whether another
electric expansion should be invoked if this rule match, and ACTIONS is a list
of functions to execute, all must return non-nil.

This variable is updated by `electric-punct-reload-rules' function.")

(defun electric-punct--put-rule (bs id recursive match actions)
  "Add a rule to rules hash table.

BS specifies whether it is backspace rule or not (it must be t or nil).  ID
specifies rule's identifier.  RECURSIVE specifies whether the rule allows
recursive rules to be invoked.  MATCH specifies the string that must match
before point for the rule to match.  ACTIONS is a list of functions to
execute when evaluating the rule.

The rule is added to `electric-punct--rules-table' table."
  (let* ((len (length match))
         (key (cons (if bs 1 0)
                    (cons (if (>= len 2) (elt match (- len 2)) 0)
                          (elt match (- len 1))))))
    (when (> len 2)
      (push (electric-punct--make-match-action match) actions))
    (puthash key
             (cons (cl-list* id len recursive actions)
                   (gethash key electric-punct--rules-table))
             electric-punct--rules-table)))

(defun electric-punct--compile-rule (rule)
  "Add a RULE to `electric-punct--rules-table' variable.
RULE must be in format found in `electric-punct-rules'."
  (let (bs id actions last-string (bs-safe t) recursive)
    ;; Consume ID and :bs
    (when (symbolp (car rule))
      (setq id (car rule) rule (cdr rule)))
    (when (eq (car rule) :bs)
      (setq bs t bs-safe nil rule (cdr rule)))
    ;; Check all the actions
    (dolist (action (cdr rule))
      (cond ((eq action :auto-bs)   (setq bs-safe action))
            ((eq action :no-bs)     (setq bs-safe nil))
            ((eq action :recursive) (setq recursive t))
            ((stringp action)
             (push (electric-punct--make-string-action action) actions)
             (setq last-string action))
            ((integerp action)
             (push (electric-punct--make-integer-action action) actions)
             (setq bs-safe nil last-string nil))
            (t
             (setq bs-safe (and bs-safe
                                (not last-string)
                                (symbolp action)
                                (get action 'electric-punct-bs-safe))
                   last-string nil)
             (push action actions))))
    (setq actions (nreverse actions))
    ;; Add the rule
    (electric-punct--put-rule bs id recursive (car rule) actions)
    ;; Create backspace rule
    (when (and bs-safe last-string)
      (setq actions (copy-sequence actions))
      ;; Replace the last action (which we know is one constructed from
      ;; a string) with an action constructed from match string with the last
      ;; character stripped.
      (setcar (last actions)
              (electric-punct--make-string-action
               (substring (car rule) 0 -1)))
      (electric-punct--put-rule t id nil last-string actions))))

(defun electric-punct-reload-rules ()
  "Reloads rules from `electric-punct-rules' updating internal rules storage."
  ;; Make sure `electric-punct--rules-table' is local if and only if
  ;; `electric-punct-rules' is local
  (if (local-variable-p 'electric-punct-rules)
      (if (local-variable-p 'electric-punct--rules-table)
          (clrhash electric-punct--rules-table)
        (setq-local electric-punct--rules-table (make-hash-table :test 'equal)))
    (when (local-variable-p 'electric-punct--rules-table)
      (kill-local-variable 'electric-punct--rules-table))
    (clrhash electric-punct--rules-table))
  ;; Process rules
  (mapc 'electric-punct--compile-rule electric-punct-rules)
  ;; All rules are now in reverse, fix that
  (maphash (lambda (key value)
             (puthash key (nreverse value) electric-punct--rules-table))
           electric-punct--rules-table)
  ;; Join (bs char-2 . char-1) lists with (bs 0 . char-1) lists.  This way
  ;; once rules for (bs char-2 . char-1) are found, all rules that may apply
  ;; for given context can be easily iterated over.  No need to further look
  ;; for (bs 0 . char-1) rules.
  (maphash
   (lambda (key value)
     (unless (eq (cadr key) 0)
       (puthash key
                (nconc value (gethash (cons (car key) (cons 0 (cddr key)))
                                      electric-punct--rules-table))
                electric-punct--rules-table)))
   electric-punct--rules-table))

(defun electric-punct-set-ruless (rules &optional local)
  "Set `electric-punct-rules' to RULES and reload them.
If LOCAL is non-nil, the `electric-punct-rules' variable is made local for
current buffer."
  (if local
      (make-local-variable 'electric-punct-rules))
  (setq electric-punct-rules rules)
  (electric-punct-reload-rules))

(electric-punct-reload-rules)


;;;;;;;;;;;;;;;;;;;; Rules handling functions ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defvar-local electric-punct-predicate nil
  "A function which must return non-nil for electric expansions to take effect.
Can be used to limit expansions to only some parts of the buffer.")

(defun electric-punct-expand ()
  "Perform electric expansion.
Do nothing if `prefix-arg' is non-nil or `electric-punct-predicate' returns
nil.  This is meant to be used as `post-self-insert-hook'."
  (interactive "*")
  (when (and (not prefix-arg)
             (or (not electric-punct-predicate)
                 (funcall electric-punct-predicate)))
    (electric-punct--do nil)))

(defun electric-punct-delete-backward-char (n &optional killflag)
  "Delete N characters backward possibly applying backspace rules.

Apply backspace rules provided that:
* `electric-punct-handle-backspace' is non-nil,
* N is one and KILLFLAG is nil (i.e. `prefix-arg' is nil),
* `delete-active-region' and `overwrite-mode' do not take effect,
* `electric-punct-predicate' is nil or returns non-nil.
If the above conditions are not met or no backspace rules matched current
context, invoke `delete-backward-char' passing N and KILLFLAG as arguments."
  (interactive "pP")
  (unless (and electric-punct-handle-backspace
               (= n 1)
               (not killflag)
               (or (not delete-active-region) (not (use-region-p)))
               (not overwrite-mode)
               (or (not electric-punct-predicate)
                   (funcall electric-punct-predicate))
               (electric-punct--do t))
    (with-no-warnings
      (delete-backward-char n killflag))))

(defun electric-punct--do (bs)
  "Process expansion rules.  Backspace rules if BS is non-nil.
Return whether there was a matching rule.  Rules are taken from
`electric-punct--rules-table' table."
  (let ((ret 'continue)
        (get (lambda (char-2 char-1)
               (gethash (cons (if bs 1 0) (cons char-2 char-1))
                        electric-punct--rules-table))))
    (while (eq ret 'continue)
      (let ((left (- (point) (point-min))))
        (setq ret (when (> left 0)
                    (let ((char-1 (preceding-char))
                          (char-2 (if (> left 1) (char-before (1- (point))))))
                      (cl-some
                       (lambda (rule)
                         ;; rule == (id  len  recursive . actions)
                         ;;          car cadr caddr       cdddr)
                         (and (>= left (cadr rule))
                              (electric-punct--set-match-data
                               (- (point) (cadr rule)))
                              (cl-every 'funcall (cdddr rule))
                              (if (caddr rule) 'continue t)))
                       (or (and char-2 (funcall get char-2 char-1))
                           (funcall get 0 char-1))))))))
    ret))


;;;;;;;;;;;;;;;;;;;; Modes ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;###autoload
(define-minor-mode electric-punct-mode
  "An minor mode helping to insert various Unicode punctuation characters.

Helps insert Unicode punctuation and space characters by just typing ASCII
characters.  This is done by inspecting context of inserted character and
replacing ordinary sequences of ASCII characters by Unicode characters.

For example – if one types minus sign twice, it is converted to an en-dash
\(just like I've done after “For example” at the beginning of the sentence),
and pressing minus sign again converts the character to an em-dash, and one
more press on minus further converts the dash to a horizontal bar.  Space
characters around the dash is also affected to be typographically more
correct.

The set of expansion rules defined in the default set is as follows:
* Space typed after a Unicode space converts it to a regular space.
* Space typed after minus sign, en-dash, em-dash or horizontal bar inserts
  space character preceding the dash character or an appropriate for given
  dash (customisable).
* A minus sign typed after a minus sign, en-dash or em-dash convert the dash
  before point into en-dash, em-dash or horizontal bar respectively and if the
  dash is proceeded by a space character converts it to space appropriate for
  given dash.
* A “<digit><ascii-space><digit>” sequence is converted to
  “<digit><thin-space><digit>” which works better for thousands separator
  <thin-space> is customisable).
* A thin space is inserted between two consecutive quotes characters, for
  example in “never say ‘never’ ” a thin space is added between closing
  quotes; This works even if the second closing quote is entered using two
  apostrophes (see below).
* The following sequences are converted into Unicode characters:
   Sequence  Becames
     ...      …    ellipsis
     ,,       „    double low-9 quotation mark
     ''       ”    right double quotation mark
     ``       “    left double quotation mark
     <<       «    left-pointing double angle quotation mark
     <<<      ≪    much less-than
     >>       »    right-pointing double angle quotation mark
     >>>      ≫    much greater-than
     >=       ≥    greater-than or equal to
     <=       ≤    less-than or equal to
     <-       ←    leftwards arrow
     ->       →    rightwards arrow
     !=       ≠    not equal to
     /=       ≠    not equal to
     !~       ≁    not tilde
     /~       ≁    not tilde
     !~=      ≄    not asymptotically equal to
     /~=      ≄    not asymptotically equal to
     !==      ≇    neither approximately nor actually equal to
     /==      ≇    neither approximately nor actually equal to
     ~=       ≈    almost equal to
     <==      ⇐    leftwards double arrow
     =>       ⇒    rightwards double arrow
     <=>      ⇔    left right double arrow
* Furthermore in major modes for editing HTML, XML and SGML files, if space is
  typed after an entity referencing a Unicode space (e.g. “&nbsp;” or
  “&#x2009;” converts the entity into a simple ASCII space.
* And also in such modes, the following rules are applied (some of them
  overwrite the more general rules above):
   Sequence  Becames
     <<      &lt;
     >>      &gt;
     &&      &amp;
     \"\"      &quot;"
  nil " ⚡"
  '(([backspace] . electric-punct-delete-backward-char))
  :group electric-punct
  (if electric-punct-mode
      (add-hook 'post-self-insert-hook 'electric-punct-expand nil t)
    (remove-hook 'post-self-insert-hook 'electric-punct-expand t)))

(defvar electric-punct-text-faces
  '(font-lock-string-face font-lock-comment-face font-lock-doc-face)
  "Faces corresponding to text in `prog-mode' buffers.")

(defun electric-punct-prog-predicate ()
  "Check whether point is inside a string or a comment.
Used for `electric-punct-predicate' in programming modes to disable electric
expansions outside of strings and comments."
  (memq (get-text-property (1- (point)) 'face)
        electric-punct-text-faces))

(defun electric-punct-lisp-predicate ()
  "Check whether point is inside a string or a comment and not after a minus.
Used for `electric-punct-predicate' in lisp modes to disable electric
expansions outside of strings and comments, as well as expansions of “--”
sequences into an en-dash."
  (let ((face (get-text-property (1- (point)) 'face)))
    (and (memq face electric-punct-text-faces)
         (or (eq font-lock-doc-face 'font-lock-string-face)
             (not (eq (preceding-char) ?-))))))

(defcustom electric-punct-auto-list
  '(;; TeX has its own syntax for various characters so no need for Unicode
    ;; magic and Python is extremely purist about source files, so it's safer
    ;; to stick to plain ASCII.
    ((tex-mode tex-shell python-mode) . nil)
    ((emacs-lisp-mode lisp-mode) . electric-punct-lisp-predicate)
    ((prog-mode js-mode js2-mode) . electric-punct-prog-predicate))
  "A list of rules for setting `electric-punct-predicate' variable.

Each element is a (MODES . PRED) cons cell.  If `major-mode' is derived from
any of the MODES (or is any of the MODES) the element takes effect.

PRED is either t meaning that there's no predicate for given mode, nil meaning
`electric-punct-mode' should not be enabled for given mode, or a predicate
function that will be assigned to `electric-punct-predicate' variable."
  :group 'electric-punct
  :type '(repeat (cons :tag "Match rule"
                       (repeat (symbol :tag "Major mode"))
                       (choice :tag "Predicate"
                               (function :tag "Function")
                               (const :tag "No predicate" t)
                               (const :tag "Disable mode" nil)))))

;;;###autoload
(defun electric-punct-auto-mode (&optional arg)
  "Turn `electric-punct-mode' with predicate depending on major mode.

If ARG is 'toggle toggle `electric-punct-mode', otherwise if ARG is omitted or
a non-negative integer enable `electric-punct-mode', otherwise if ARG is
a negative integer, disable `electric-punct-mode'.  When called interactively
with no prefix argument, toggles `electric-punct-mode'.

The `electric-punct-predicate' is set according to rules defined in
`electric-punct-auto-list' variable.  However, if according to those rules
`electric-punct-mode' should not be enabled, `electric-punct-mode' is
disabled.

`electric-punct-mode' is disabled also if current buffer is a mini buffer or
its name starts with \" *\"."
  (interactive (list (or current-prefix-arg 'toggle)))
  (let ((pred
         (and (not (string-prefix-p " *" (buffer-name)))
              (not (minibufferp))
              (if (eq arg 'toggle)
                  (not electric-punct-mode)
                (>= (prefix-numeric-value arg) 0))
              (cdr (cl-some (lambda (el)
                              (when (let ((m (car el)))
                                      (or (eq m t) (apply 'derived-mode-p m)))
                                el))
                            electric-punct-auto-list)))))
    (if pred
        (progn
          (setq electric-punct-predicate (if (eq pred t) nil pred))
          (electric-punct-mode 1))
      (electric-punct-mode -1))))

;;;###autoload
(define-globalized-minor-mode global-electric-punct-mode
  electric-punct-mode electric-punct-auto-mode)


(provide 'electric-punct)

;;; electric-punct.el ends here



reply via email to

[Prev in Thread] Current Thread [Next in Thread]