Re: German Umlauts / UTF8 with comparse

Yes, this helps. Kind of ;-) ... using the character set char-set:alphabetic, my umlauts are now parsed. But I don't get them back in my result, at least not as printable characters. Instead, the following happens, and utterly confuses me:

#;2> (define s3 (parse letters (string->list s)))
#;3> s3
"Gnsesger"
#;4> (string-length s3)
6
#;5> (string->list s3)
(#\G #\x4bb3 #\e #\s #\x49e5 #\r)
#;6> (list->string (string->list s3))
"G䮳es䧥r"

So, I put the parse result into 's3'. Printing it, I read an eight character string, namely the one I want, minus my beloved umlauts. 'string-length' returns that string to be six characters long, and 'string->list' gives me exactly that, swallowing still other ASCII characters of my string and reversing that using 'list->string' includes Chinese ... even though '(list->string (string->list s1))', with my pure ASCII string, reverses without fault.

I guess I have some problems understanding some utf8 concepts?!

/Christoph

On Mon, Feb 17, 2020 at 3:38 PM <address@hidden> wrote:

Christoph Lange <address@hidden> wrote:
> meaning, that the ä isn't recognized as being a letter within the
> 'char-set:letter'.

The utf8 egg’s srfi-14 character sets are designed to be compatible with the original srfi-14 and only contain ASCII characters, as stated in the documentation:
https://wiki.call-cc.org/eggref/5/utf8#unicode-char-sets
“The default SRFI-14 char-sets are defined using ASCII-only characters”

You might want to import the unicode-char-sets module, and use one of its
sets, like char-set:alphabetic.

I hope this helps. :)

From:	Christoph Lange
Subject:	Re: German Umlauts / UTF8 with comparse
Date:	Mon, 17 Feb 2020 20:13:08 +0100