emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bug in some calls to split-string.


From: Stephen J. Turnbull
Subject: Bug in some calls to split-string.
Date: Sat, 20 Jul 2013 16:34:59 +0900

Richard Stallman writes:

 > I discovered a call to split-string that looked like this:
 > 
 >   (split-string recipients "[ \t\n]*,[ \t\n]*")
 > 
 > This has a bug: it fails to discard whitespace from the
 > start of the first substring or the end of the last.

Bugginess depends on whether you can rely on the caller to deliver a
trimmed string.

 > I would guess that there are many such bugs.

Using `split-string' to "parse" mail headers is already a quick hack.
If a user wrote the header, even the presence of the required commas
is questionable (people often expect line breaks to separate
addresses), but `(split-string recipients ",")' followed by further
processing as necessary is probably OK most of the time.

 > To provide a clean and easy way to fix this, I have implemented a new
 > argument TRIM in split-string.  Please take a look.

Trimming leading and trailing whitespace is a generally useful
function.  Why not[1]

    (defun trim-string (s &optional extract)
      (setq extract (or extract "^\\s-*\\<\\(.*\\)\\>\\s-*$"))
      (save-match-data
        (string-match extract s)
          ;; This may be an attractive nuisance, but is needed to keep
          ;; expressions that match on boundaries acceptably simple AFAICS:
          ;; eg, \< doesn't match anywhere in a blank or empty string.
          (or (match-string 1 s) "")))

and the intent of the split-string expr above can be implemented by
either

   (split-string (trim-string recipients) "[ \t\n]*,[ \t\n]*")

(probably more efficient) or

   (mapcar #'trim-string (split-string recipients ","))

(the semantics currently implemented).

If you decide to add TRIM, is this part of the docstring:

    If you want to trim whitespace from the substrings, the reliably
    correct way is using TRIM.  Making SEPARATORS match that
    whitespace gives incorrect results when there is whitespace at the
    start or end of STRING.  If you see such calls to `split-string',
    please fix them.

appropriate?  Shouldn't it be moved to the Lispref?


Footnotes: 
[1]  Specialization to the case where leading and trailing strings
match the same regexp may be equivalent.  Here is that API using the
same algorithm:

    (defun trim-string (s &optional trim)
      (setq trim (or trim "\\s-*\\<\\|\\>\\s-*")
      (let ((extract (concat "^" trim "\\(.*\\)" trim "$"))
        (save-match-data
          (string-match extract s)
            (or (match-string 1 s) "")))

which works for the default case, but may not be quite as general for
arbitrary EXTRACT regexps.

Why the generality?  Eg (using the EXTRACT-style API):

(trim-string s (concat "^"                         ; beginning of target
                       "\\s-*/?\\* "               ; C block comment leader
                       "\\(\\<.*\\>\\)?"           ; extract content
                       "\\s-*\\(?:\\*/?\\s-*\\)?"  ; C block comment trailer
                       "$"))                       ; end of target

I think this can also be expressed using TRIM (just alternate the C
block comment leader and trailer).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]