emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Proposed enhancement for `split-string'


From: Drew Adams
Subject: Proposed enhancement for `split-string'
Date: Mon, 14 Jul 2014 15:51:24 -0700 (PDT)

Function `split-string' currently has this signature, where SEPARATORS
is a regexp that defines (by matching) the separators used to split
the STRING:

(split-string STRING &optional SEPARATORS OMIT-NULLS TRIM)

The STRING parts returned are the non-matches for regexp SEPARATORS.


I have an enhancement of `split-string' to propose, which lets you
alternatively split the string based on a character predicate or a
text property, instead of based on matching a regexp.

Code:        http://www.emacswiki.org/emacs-en/download/subr%2b.el
Description: http://www.emacswiki.org/emacs/SplittingStrings

I can submit the enhancenment as a patch of subr.el, if there is
interest.

---

This would be the new (compatible) signature of `split-string':

(split-string STRING &optional HOW OMIT-NULLS TRIM FLIP TEST)
                               ^^^                 ^^^^ ^^^^

The second arg, HOW, can be a regexp, giving the same behavior as now.
Alternatively, HOW can be (1) a character predicate or (2) a doubleton
plist (PROPERTY VALUE), where PROPERTY is a text property and VALUE is
one of its possible values.

1. If HOW is a predicate then it must accept a character argument.
   Substrings whose chars satisfy the predicate are used as
   separators, so the return value is a list of substrings whose chars
   do *not* satisfy predicate HOW.

2. If HOW is (PROPERTY VALUE) then STRING is split into substrings
   whose chars do *not* have text property PROPERTY with value VALUE.

If VALUE is nil then any non-nil VALUE matches; that is, only the
presence of PROPERTY is tested.  Characters that have PROPERTY belong
to the separators, which are excluded.

If VALUE is non-nil then a match occurs when the actual value of
PROPERTY is `eq' to VALUE; that is, characters that have a PROPERTY of
VALUE are those that are excluded.

Non-nil optional arg TEST is a binary predicate that is applied to
each char in STRING and to VALUE.  If it returns non-nil for a given
character occurrence then that occurrence is part of a substring that
is excluded from the result (i.e., the char is part of a separator).

IOW, there are 3 ways to define the separator strings for splitting:
regexp matching, char-predicate satisfying, and text-property
matching.

By providing non-nil TEST you can test, for example:

* Whether the actual value of text property `invisible' belongs to the
  current `buffer-invisibility-spec'.

* Whether a particular face is among the faces that are the value of
  property `face'.

Non-nil optional arg FLIP simply swaps the separators and the kept
substrings - regardless of HOW the separating is defined.  The
substrings that would be returned if FLIP were nil are treated as the
separators, and the substrings that would be treated as separators if
FLIP were nil are returned as the result of splitting.

The code I have also defines the following functions (in addition to a
few helper functions).

First, 3 specializations of `split-string', corresponding to the 3
kinds of HOW:

* `split-string-by-regexp' - `split-string' specialized for a regexp
  HOW.  That is, split by separator regexp matching.  This is the
  behavior of today's `split-string'.

* `split-string-by-property' - `split-string' specialized for a
  property-value HOW.  That is, split by separator property-value
  matching.

* `split-string-by-predicate - `split-string' specialized for a
  char-predicate HOW.  That is, split by separator predicate
  satisfying.

Second, functions similar to `buffer-substring', which return the
region as a string, but which exclude or include only certain string
parts:

* `buffer-substring-of-propertied' - Return the parts that have a
  given PROPERTY.

* `buffer-substring-of-unpropertied' - Return the parts that do not
  have a given PROPERTY.

* `buffer-substring-of-visible' - Return the visible parts.

* `buffer-substring-of-invisible' - Return the invisible parts.

* `buffer-substring-of-faced' - Return the parts that have property
  `face'.

* `buffer-substring-of-unfaced' - Return the parts that do not have
  property `face'.

Example use case:

I use `buffer-substring-of-visible' in a function that I bind to
`filter-buffer-substring-function', to remove invisible text from the
region string (which I use as part of an indirect buffer name):

(lambda (beg end _delete)  ; Remove invisible text.
  (let ((strg   (buffer-substring-of-visible beg end)))
    (set-text-properties 0 (length strg) () strg)
  strg))



reply via email to

[Prev in Thread] Current Thread [Next in Thread]