I read older threads about parsing Japanese with comparse and took some ideas from there, but am still stuck:
(import comparse utf8 utf8-srfi-14)
(define s "Gänsesäger 2,1")
(define s1 "Rotkehlchen 1,0")
(satisfies (lambda (c) (char-set-contains? cs c))))
(utf8-in char-set:letter))
(as-string (repeated letter 1 20)))
This is what I have, and the beginning 'word' in the beginning of s1 is parsed completely and correctly with the 'letters' parser:
#;1> (parse letters (string->list s1))
#<parser-input lazy-seq #\space #\1 #\, #\0>
For 's' though I get this:
#;2> (parse letters (string->list s))
#<parser-input lazy-seq #\ #\n #\s #\e #\s #\ #\g #\e #\r #\space ...>
meaning, that the ä isn't recognized as being a letter within the 'char-set:letter'. (The UTF8 aspect of correct character width works on the other hand: in the remaining string, the ä is represented by only one #\. If I don't use the UTF8 string equivalents by importing 'utf8', it would be two.)
Any hint for me?
/Christoph
--
Christoph Lange
Lotsarnas Väg 8
430 83 Vrångö