chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

German Umlauts / UTF8 with comparse


From: Christoph Lange
Subject: German Umlauts / UTF8 with comparse
Date: Mon, 17 Feb 2020 14:31:09 +0100

I read older threads about parsing Japanese with comparse and took some ideas from there, but am still stuck:

(import comparse utf8 utf8-srfi-14)

(define s "Gänsesäger 2,1")
(define s1 "Rotkehlchen 1,0")

(define (utf8-in cs)
  (satisfies (lambda (c) (char-set-contains? cs c))))

(define letter
  (utf8-in char-set:letter))

(define letters
  (as-string (repeated letter 1 20)))


This is what I have, and the beginning 'word' in the beginning of s1 is parsed completely and correctly with the 'letters' parser:

#;1> (parse letters (string->list s1))
"Rotkehlchen"
#<parser-input lazy-seq #\space #\1 #\, #\0>
; 2 values

For 's' though I get this:

#;2> (parse letters (string->list s))
"G"
#<parser-input lazy-seq #\ #\n #\s #\e #\s #\ #\g #\e #\r #\space ...>
; 2 values


meaning, that the ä isn't recognized as being a letter within the 'char-set:letter'. (The UTF8 aspect of correct character width works on the other hand: in the remaining string, the ä is represented by only one #\. If I don't use the UTF8 string equivalents by importing 'utf8', it would be two.)

Any hint for me?

/Christoph

--
Christoph Lange
Lotsarnas Väg 8
430 83 Vrångö

reply via email to

[Prev in Thread] Current Thread [Next in Thread]