Re: What to do for faster `remove-duplicates'?

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What to do for faster `remove-duplicates'?

From:	Oleh Krehel
Subject:	Re: What to do for faster `remove-duplicates'?
Date:	Wed, 06 May 2015 15:30:45 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

Stefan Monnier <address@hidden> writes:

>> So should `cl-remove-duplicates' be a 2-in-1 (lists for small input,
>
> Don't know/don't care about cl-remove-duplicates, but for delete-dups, I'd
> welcome a patch which switches to a hash-table after the Nth element.

I attach the patch. I did a bunch of `benchmark-run' and it seems that
100 elements is the breaking point.

> PS: BTW, helm-fast-remove-dups's docstring says:
>
>     This is same as `remove-duplicates' but with memoisation.
>
>   but it doesn't actually use memoization, AFAICT.

It does in a way, for an optimized-out "is-element-in-collection"
function.

Oleh

>From c845e6384283c17b61c3a0401213e4a1837807c5 Mon Sep 17 00:00:00 2001
From: Oleh Krehel <address@hidden>
Date: Wed, 6 May 2015 15:21:23 +0200
Subject: [PATCH] lisp/subr.el (delete-dups): Use a hash table

* lisp/subr.el (delete-dups): When there are more than 100 candidates,
  use a hash table. This can result in ~500 times speed-up for typical
  collections of size 5000, like that of `load-library'.
---
 lisp/subr.el | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/lisp/subr.el b/lisp/subr.el
index 0fec29c..591980d 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -417,11 +417,19 @@ If N is omitted or nil, remove the last element."
 Store the result in LIST and return it.  LIST must be a proper list.
 Of several `equal' occurrences of an element in LIST, the first
 one is kept."
-  (let ((tail list))
-    (while tail
-      (setcdr tail (delete (car tail) (cdr tail)))
-      (setq tail (cdr tail))))
-  list)
+  (if (> (length list) 100)
+      (let ((hash (make-hash-table :test #'equal))
+            res)
+        (dolist (elt list)
+          (unless (gethash elt hash)
+            (puthash elt elt hash)
+            (push elt res)))
+        (nreverse res))
+    (let ((tail list))
+      (while tail
+        (setcdr tail (delete (car tail) (cdr tail)))
+        (setq tail (cdr tail))))
+    list))
 
 ;; See http://lists.gnu.org/archive/html/emacs-devel/2013-05/msg00204.html
 (defun delete-consecutive-dups (list &optional circular)
-- 
1.8.4

[Prev in Thread]

Current Thread

[Next in Thread]

What to do for faster `remove-duplicates'?, Oleh Krehel, 2015/05/06
- Re: What to do for faster `remove-duplicates'?, Stefan Monnier, 2015/05/06
  - Re: What to do for faster `remove-duplicates'?, Oleh Krehel <=
    - Re: What to do for faster `remove-duplicates'?, Stefan Monnier, 2015/05/06
    - Re: What to do for faster `remove-duplicates'?, Oleh Krehel, 2015/05/06
    - Re: What to do for faster `remove-duplicates'?, Thierry Volpiatto, 2015/05/06
    - Re: What to do for faster `remove-duplicates'?, Oleh Krehel, 2015/05/06
    - Re: What to do for faster `remove-duplicates'?, Artur Malabarba, 2015/05/06
    - Re: What to do for faster `remove-duplicates'?, Artur Malabarba, 2015/05/06
    - Re: What to do for faster `remove-duplicates'?, Oleh Krehel, 2015/05/06
    - Re: What to do for faster `remove-duplicates'?, Artur Malabarba, 2015/05/06
    - Re: What to do for faster `remove-duplicates'?, Oleh Krehel, 2015/05/06
    - Re: What to do for faster `remove-duplicates'?, Artur Malabarba, 2015/05/06

Prev by Date: Re: xref-find-matches and stuff
Next by Date: Re: What to do for faster `remove-duplicates'?
Previous by thread: Re: What to do for faster `remove-duplicates'?
Next by thread: Re: What to do for faster `remove-duplicates'?
Index(es):
- Date
- Thread