chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Issue w/ string-trim functions in utf8-srfi-13


From: Matt Gushee
Subject: [Chicken-users] Issue w/ string-trim functions in utf8-srfi-13
Date: Thu, 19 Sep 2013 14:55:12 -0600

Hello--

I've noticed the following unexpected behavior with the string
trimming functions in utf8-srfi-13:

[BTW: this affects the civet egg, so if anyone is using civet, please
see the note at the bottom]

$ csi

CHICKEN
(c) 2008-2013, The Chicken Team
(c) 2000-2007, Felix L. Winkelmann
Version 4.8.0.4 (stability/4.8.0) (rev 578619b)
linux-unix-gnu-x86 [ manyargs dload ptables ]
compiled 2013-07-15 on aeryn.xorinia.dim (Darwin)

; loading /home/matt/.csirc ...
; << etc. >>

csi> (use srfi-13)
csi> (define strings '("abc" "\t   abc" "\r   abc" "\t   abc")
csi> (map string-trim-both strings)
("abc" "abc" "abc" "abc")
csi> (use utf8-srfi-13)
; loading /usr/lib/chicken/6/utf8-srfi-13.import.so ...
; << etc. >>

csi> (map string-trim-both strings)
("abc" "\t   abc" "\r   abc" "\n   abc")

And since SRFI-13 states:

> Char/char-set/pred defaults to the character set char-set:whitespace defined 
> in SRFI 14.

... it seems pretty clear that this is an error in the utf8 egg (or at
least a point of non-conformance that should be documented). Unless,
of course, there is something important that I don't understand
(always a possibility ;-)

In any case, the explanation for the unexpected behavior is not hard
to find: inspecting utf8-srfi-13.scm, I find:

(define (string-trim-both s . opt)
  (let-optionals* opt ((trimmer #\space))
    (string-trim (apply string-trim-right s opt) trimmer)))

... and similarly for string-trim and string-trim-right ... evidently
all three functions default to removing only #\space characters.
Shouldn't it be 'char-set:whitespace' ?

NOTE TO civet USERS:

Since civet uses utf8-srfi13 for string processing, this issue can
produce incorrect output for dynamic attribute insertion (i.e., using
the <cvt:attr> element).

I am assuming this will be fixed in the utf-8 egg; in the meantime, I
have implemented a workaround in a Git branch, but I'm not going to
merge it into master unless I find out that the utf8 egg's behavior is
intentional. So if you would like the modified version of civet, do
the following:

 > git clone --branch string-trim-workaround
https://github.com/mgushee/civet.git

Best regards,
Matt Gushee



reply via email to

[Prev in Thread] Current Thread [Next in Thread]