|
From: | Maxime Devos |
Subject: | RE: [PATCH 1/3] Make string-length documentation more correct |
Date: | Wed, 26 Jun 2024 13:46:28 +0200 |
>> >-Returns the number of characters in the given @var{string}. >> +Returns the number of bytes in the given @var{string}. >> >> This is false. For example, (string-length "😀") is 1, whereas in all encodings I know of it is >more than one byte. Also, R5RS says: [...] > >Maybe `the number of codepoints` will work here. > >(string-length "👨🏭") ;; => 3 >(string-length "é") ;; => 2 > >The number of characters here is 1 in both cases. No, in Unicode (and Guile equates character=Unicode character) all characters correspond to a single codepoint. You need to fix your setup, that’s not what Guile does. Are you sure you have set the encoding of current-input-port correctly? (Probably by setting LC_ALL or the like to a UTF-8 locale.) Otherwise the 3 bytes in the UTF-8 encoding might be interpreted in terms of some 8-bit encoding. Here’s a test: if you can input #\👨🏭 without errors and it evaluates to #\👨🏭, then the encoding should be set up correctly. Best regards, Maxime Devos |
[Prev in Thread] | Current Thread | [Next in Thread] |