[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] question on C types
From: |
Joerg F. Wittenberger |
Subject: |
Re: [Chicken-users] question on C types |
Date: |
24 Nov 2003 19:39:02 +0100 |
User-agent: |
Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 |
> > somehow I can't understand the fine manual. I want to code some parts
> > in C and can't interface.
>
> (subtle point about the manual taken ;-)
Maybe it was just a subtle point about silly me. ;-)
> (in this case a pointer to C_words holding fixnums). Your example will
> break, though, since you can not convert the numbers into a format
> Scheme recognizes.
Hm, why not? I actually need an opaque array, which is going to be
extensively used (I've seen 45min till I ^C'ed it) from C until there
is _one_ integer result. Hence I don't want to bitshift a tag to each
int and remove it in such a long loop.
> Another approach (if you want to have a number vector) is to use
Speed is the primary issue.
> SRFI-4 vectors and pass them as of the proper type:
>
> (declare (uses srfi-4)) ; or (require 'srfi-r), if your registry is set up.
>
> (let ([result (make-s32vector size)])
> ((foreign-lambda* void ([s32vector line] [int size])
> "int *p = line; int i; for(i = 0; i < size; ++i) p[ i ] = i;")
> result size)
> result)
This (and the chicken code) looks as if that's what I need, thanks.
At first the result appears to work, but as I said, it's a long
running computation. I need to make sure it doesn't block threading.
Looks as if I need some help, so here are the missing details:
The "Levenshtein"-Egg of Lars Rustemeier is obviously derived from
Lorenzo Seidenari's C-Implementation (though it fails to mention). I
don't have a reference, but I remember that Lorenzo called that code
"not for production" and that's for a reason. No, not for one, for
several:
a) The memory consumtion is O(n*m), while at most O(min(n,m)) is
actually required. (with n,m the length of the input strings)
See http://www.merriampark.com/ldc.htm look for
d=malloc((sizeof(int))*(m+1)*(n+1));
b) The continues like only proof of concept code is allowed to: it
does *not* check "d" the result of the allocation! Memory corruption
ahead. Especally in presence of the excessive allocation. Don't use
that code for real, please!
c) For long strings with a small Levenshtein distance (the most useful
case), much can be safed, by first skiping common pre- and suffixes.
d) It blocks the chicken scheduler.
I hope a fixed most of the bugs. But it still blocks the scheduler
for me, though I don't know why. Additionally the code below provides
a short circuited (levenshtein< s t limit), which doesn't compute the
whole distance, in case you just want to check whether it exceeds a
certain limit.
best regards
/Jörg
--
The worst of harm may often result from the best of intentions.
;;** Levenshtein
;; (C) 2003 Jörg F. Wittenberger
;; You may use this code under either the GPL or these conditions:
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions
;; are met:
;; Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;; Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the
;; distribution.
;; Neither the name of the author nor the names of its contributors
;; may be used to endorse or promote products derived from this
;; software without specific prior written permission.
;; See http://www.merriampark.com/ld.htm
(define-macro (add1 n)`(+ 1 ,n))
(define (lev-init mx)
(##sys#check-exact mx 'levenshtein-matrix-allocation)
(cond-expand
[unsafe]
[else (when (fx< mx 0)
(##sys#error 'levenshtein-matrix-allocation
"size is negative" mx))])
(let ((result (make-u32vector (add1 mx))))
((foreign-lambda*
void
((u32vector line) (integer m))
"int *p=line; int i; for(i=0; i<=m; ++i) p[i]=i;")
result mx)
result))
(define (lev-dist line m) (u32vector-ref line m))
(define lev-step!
(foreign-lambda*
integer
((u32vector matrix) (integer m) (integer i)
(pointer a) (pointer b) (integer o))
#<<EOF
#define min(a, b) (((a) < (b)) ? (a) : (b))
int *d_i = (int*) matrix;
unsigned char *s = (unsigned char *)(a) + o;
unsigned char *t = (unsigned char *)(b) + o;
int distance=d_i[0], j, left, cost;
d_i[0]=i;
for(j=1; j<=m; ++j) { /* row loop */
left = d_i[j];
/* Step 5 */
cost = s[i-1]==t[j-1] ? 0 : 1;
/* Step 6 */
d_i[j] = min(min(d_i[j-1]+1, left+1), distance+cost);
distance = left;
}
return(distance);
EOF
))
(define (lev-0 a b)
(do ((m (sub1 (string-length a)) (sub1 m))
(n (sub1 (string-length b)) (sub1 n)))
((or (fx< m 0) (fx< n 0) (not (eqv? (string-ref a m) (string-ref b n))))
(do ((m (add1 m) (sub1 m))
(n (add1 n) (sub1 n))
(i 0 (add1 i)))
((or (fx< m 1) (fx< n 1)
(not (eqv? (string-ref a i) (string-ref b i))))
(values i m n))))))
(define (levenshtein-distance s t)
(receive
(off sl tl) (lev-0 s t)
(cond
((eqv? sl 0) tl)
((eqv? tl 0) sl)
(else
(if (< tl sl)
(lev-exec (lev-init sl) s t off sl tl)
(lev-exec (lev-init tl) t s off tl sl))))))
(define (lev-exec matrix s t o m n)
(do ((i 1 (add1 i)))
((> i n) (lev-dist matrix m))
(lev-step! matrix m i s t o)))
(define (lev-exec< matrix s t o m n limit)
(let loop ((i 1) (distance 0))
(cond
((>= distance limit) #f)
((> i n) (>= (lev-dist matrix m) limit))
(else (loop (add1 i) (lev-step! matrix m i s t o))))))
(define (levenshtein< s t limit)
(receive
(off sl tl) (lev-0 s t)
(cond
((eqv? sl 0) (< tl limit))
((eqv? tl 0) (< sl limit))
(else
(if (< tl sl)
(lev-exec< (lev-init sl) s t off sl tl limit)
(lev-exec< (lev-init tl) t s off tl sl limit))))))