help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to implement line sorting, uniquifying and counting function in


From: Evgeny Roubinchtein
Subject: Re: How to implement line sorting, uniquifying and counting function in emacs?
Date: Mon, 30 Sep 2002 05:15:16 GMT
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.5 (brussels sprouts)

,----
| In shell you can do this:
|    cat file | sort | uniq -d | wc 
| 
| to count the repeated lines. You can also do
| 
|    cat file | sort | uniq -u | wc
| 
| to count the unique lines.
| 
| Sometimes I have to do this on windows platform where I do have emacs.
| This means that I cannot escape to shell and that route is not available.
| 
| Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
| know the equivalent to wc.
`----

Assuming the text you are interested in is in a buffer, one apprach is
to use the `sort-lines' function.  Once the lines are sorted, it's
pretty easy to count unique and non-unique lines.  That's one
approach.

(defun count-repeated-lines (&optional beg end)
  (let ((buf (current-buffer))
        (repeated-count 0)
        (unique-count 0)
        (cur-line nil)
        (prev-line nil))
    (with-temp-buffer
      (insert-buffer-substring buf 
                               (and beg 
                                    (with-current-buffer buf
                                      (save-excursion
                                        (goto-char beg)
                                        (line-beginning-position))))  
                               end)
      (sort-lines nil (point-min) (point-max))
      ;; put a dummy line before the text to make the loop simpler
      (goto-char (point-min))
      (insert "\n")
      (goto-char (point-min))
      (while (and (zerop (forward-line 1)) (/= (point) (point-max)))
        (setq cur-line (buffer-substring-no-properties (point) 
                                                       (save-excursion 
(end-of-line) 
                                                                       
(point))))
        (if (and prev-line (string= prev-line cur-line))
            (setq repeated-count (1+ repeated-count))
          (setq unique-count (1+ unique-count)))
        (setq prev-line cur-line))
      (cons unique-count repeated-count))))

Instead of sorting lines, you could use Emacs built-in hash tables
(built-in as of GNU Emacs v21, not sure what version of XEmacs first
introduced hash tables) to keep track of lines you've encountered so
far.  (You also don't need a temporary buffer in that case).

(defun count-repeated-lines (&optional beg end)
  (let ((buf (current-buffer))
        (beg (or (and beg (save-excursion (goto-char beg) 
(line-beginning-position))) 
                 (point-min)))
        (end (or end (point-max)))
        (lines-hash (make-hash-table :test #'equal))
        (unique-count 0)
        (repeated-count 0)
        (cur-line nil))
    (save-excursion 
      (goto-char beg)
      (beginning-of-line)
      (while (< (point) end)
        (setq cur-line (buffer-substring-no-properties (point) 
                                                       (save-excursion 
(end-of-line) 
                                                                       
(point))))
        (if (gethash cur-line lines-hash)
            (setq repeated-count (1+ repeated-count))
          (setq unique-count (1+ unique-count))
          (puthash cur-line t lines-hash))
        (forward-line))
      (cons unique-count repeated-count ))))


reply via email to

[Prev in Thread] Current Thread [Next in Thread]