guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string vs list processing


From: Sascha Ziemann
Subject: Re: string vs list processing
Date: 16 Apr 2001 15:40:48 +0200
User-agent: Gnus/5.0803 (Gnus v5.8.3) Emacs/20.6

Michael Livshin <address@hidden> writes:

| Sascha Ziemann <address@hidden> writes:
| 
| > Is it right, that in Guile list processing is much faster than string
| > processing?
| 
| "processing" is not precise enough.
| 
| it is certainly true that consing new pairs is *much* faster than
| consing new strings, since the latter involves calling `malloc'.
| 
| that seems to explain your results.

I have more examples.  Task: remove the escape chars from a HTTP
request.

This is the first version I found on the net:

----------------------------------------------------------------------
(define-public (url-decode str)
  ;; for efficiency reasons, we scan the string backwards, consing
  ;; a list of replacement characters as we go; any plus is replaced
  ;; by a space, and if a `%' is seen then the last two conses are
  ;; replaced by their hex-decoded equivalent.  A regex replace would
  ;; be conceptually cleaner, but less efficient, and it would require
  ;; invoking a closure for each hex decode.  Maybe scsh's funky
  ;; regex support would be helpful here!
  (let decode ((i      (1- (string-length str)))
               (chars '()))
    (if (< i 0)
        (apply string chars)
        (case (string-ref str i)
          ;; convert plusses
          ((#\+)  (decode (1- i) (cons #\space chars)))
          ;; hex-decode bytes
          ((#\%)  (let ((hi (car chars))
                        (lo (cadr chars)))
                    (decode (1- i) (cons (integer->char (hex->number hi lo))
                                         (cddr chars)))))
          ;; default: copy char verbatim
          (else   (decode (1- i) (cons (string-ref str i) chars)))))))

(define (hex->number hi lo)
  (+ (* 16 (hex-char->number hi))
     (hex-char->number lo)))

(define (hex-char->number ch)
  (if (char-alphabetic? ch)
      (+ 10
         (- (char->integer (char-upcase ch))
            (char->integer #\A)))
      (- (char->integer ch)
         (char->integer #\0))))
----------------------------------------------------------------------

And this the second:

----------------------------------------------------------------------
(define (decode-uri str)
  (regexp-substitute/global
   #f "\\+|%([0-9A-Fa-f][0-9A-Fa-f])" str
   'pre
   (lambda (m)
     (cond ((string=? "+" (match:substring m 0)) " ")
           (else (integer->char
                  (string->number
                   (match:substring m 1)
                   16)))))
   'post))
----------------------------------------------------------------------

And this is what I wrote:

----------------------------------------------------------------------
(define (hex-char->integer ch)
  (case ch
    ((#\0) 0) ((#\1) 1) ((#\2) 2) ((#\3) 3) ((#\4) 4)
    ((#\5) 5) ((#\6) 6) ((#\7) 7) ((#\8) 8) ((#\9) 9)
    ((#\a #\A) 10) ((#\b #\B) 11) ((#\c #\C) 12)
    ((#\d #\D) 13) ((#\e #\E) 14) ((#\f #\F) 15)
    (else #f)))

(define (integer->hex-char int)
  (case int
    ((0) #\0) ((1) #\1) ((2) #\2) ((3) #\3) ((4) #\4)
    ((5) #\5) ((6) #\6) ((7) #\7) ((8) #\8) ((9) #\9)
    ((10) #\A) ((11) #\B) ((12) #\C)
    ((13) #\D) ((14) #\E) ((15) #\F)
    (else #f)))

(define-public (uri-unescape str)
  (let loop ((done '())
             (todo (string->list str)))
    (if (null? todo)
        (list->string done)
        (case (car todo)
          ((#\%)
           (loop (append done
                         (cons (integer->char
                                (+ (* (hex-char->integer (cadr todo)) 16)
                                   (hex-char->integer (caddr todo))))
                               '()))
                 (cdddr todo)))
          ((#\+)
           (loop (append done (cons #\space '()))
                 (cdr todo)))
          (else
           (loop (append done (cons (car todo) '()))
                  (cdr todo)))))))
----------------------------------------------------------------------

This first version uses the string function string-ref, the second
uses regular expressions and the last version converts the string into
a list and works on the list and converts the resulting list into a
string.  And this is the funny result:

(compare-run-time 10000
  (url-decode "It%27s%20me%21")
  (decode-uri "It%27s%20me%21")
  (uri-unescape "It%27s%20me%21")
  )
;;   3.24 seconds for 10000 times (url-decode "It%27s%20me%21")
;;  10.42 seconds for 10000 times (decode-uri "It%27s%20me%21")
;;   1.72 seconds for 10000 times (uri-unescape "It%27s%20me%21")

I think I can live with slow regular expressions, but why is a simple
string-ref so slow?

The following example compares string-ref directly with list-ref:

(let* ((str (string-append (list->string (make-list 1000 #\A))))
       (lst (string->list str))
       (len (1- (length lst)))
       (len/2 (quotient len 2)))
  (compare-run-time 100000
    (string-ref str 0)
    (list-ref lst 0)
    (string-ref str len/2)
    (list-ref lst len/2)
    (string-ref str len)
    (list-ref lst len)))
;;   1.21 seconds for 100000 times (string-ref str 0)
;;   1.24 seconds for 100000 times (list-ref lst 0)
;;   1.24 seconds for 100000 times (string-ref str len/2)
;;   1.88 seconds for 100000 times (list-ref lst len/2)
;;   1.22 seconds for 100000 times (string-ref str len)
;;   2.58 seconds for 100000 times (list-ref lst len)

Direct indexing of a string is only twice as fast as walking through a
list of 1000 elements?  I think there must be something wrong.

bis später...
Sascha

-- 
Ein Hoch auf den 7. Juni 2000



reply via email to

[Prev in Thread] Current Thread [Next in Thread]