emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rationale for split-string?


From: Stephen J. Turnbull
Subject: Re: Rationale for split-string?
Date: Fri, 18 Apr 2003 20:50:42 +0900
User-agent: Gnus/5.090016 (Oort Gnus v0.16) XEmacs/21.5 (cabbage)

>>>>> "Stefan" == Stefan Monnier <monnier+gnu/address@hidden> writes:

    >> What is the rationale for the specification of `split-string'?

    Stefan> I think the reason is for the default case.  In XEmacs we
    Stefan> get:

ELISP> (split-string "  a  b  ")
("" "a" "b" "")

    Stefan> What is usually desired here is to eliminate all empty
    Stefan> parts.

I tend to agree, but remember Larry Wall does not.  That concerns me;
Larry is nothing if not remarkably good at intuiting what works.  And
the (delete "" (split-string ...))  idiom is hardly an exercise in
perversion or a brainteaser.

    Stefan> A gross hack is to test if the last char of the regexp is
    Stefan> ?+ and if so get rid of empty strings at start and end.
    Stefan> It should take care of 99% of the cases.

That's an implementation, not a specification.  Using that means we'll
be having this discussion again, sooner or later.  Think about someone
who writes a smart SEPARATORS to get rid of whitespace or leaders
around the elements.  I really don't like the idea of iterating a spec
every time somebody finds a plausible use for the function that some
"less gross than the last time hack" rules out.  If you want a
specific common case optimized, test for that.

Eg, how about one of

(defun split-string-sanely (string &optional separators)
  (cond ((eq separators t) (gnu-emacs-split-string string))
        (t (xemacs-split-string string separators))))

(defun split-string-sanely-too (string &optional separators)
  (let ((result (xemacs-split-string string separators)))
    (cond ((stringp separators)        result)
          ((eq separators 'omit-nulls) (delete "" result))
          (t (error 'invalid-argument
                    "SEPARATORS must be a string or 'omit-nulls"
                    separators)))))

(defun split-string-flexibly (string &optional separators thunk)
  (let ((result (xemacs-split-string string separators)))
    (cond ((functionp thunk)      (delete-if thunk result))
          ((eq thunk 'omit-nulls) (delete "" result))
          ((null thunk)           result)
          (t (error 'invalid-argument
                    "THUNK must be nil, 'omit-nulls, or a function"
                    thunk)))))

These can be easily generalized to further useful special cases
(deleting blank strings or non-numbers, anyone?) without ever screwing
up old code or ruling out uses of a given SEPARATORS regexp.

In fact, my preference would be to implement and name more or less as
above, in which case I would default differently (e.g., if SEPARATORS
is nil, use the omit-nulls behavior).  Then the internal function
could be named `split-string' and have the simple, consistent
behavior.  Both APIs would be considered public.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]