guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH 1/3] Make string-length documentation more correct


From: Andrew Tropin
Subject: RE: [PATCH 1/3] Make string-length documentation more correct
Date: Fri, 28 Jun 2024 17:38:38 +0400

On 2024-06-26 13:46, Maxime Devos wrote:

>>>  >-Returns the number of characters in the given @var{string}.
>>> +Returns the number of bytes in the given @var{string}.
>>>  
>>> This is false. For example, (string-length "πŸ˜€") is 1, whereas in all 
>>> encodings I know of it is >more than one byte. Also, R5RS says: [...]
>>
>>Maybe `the number of codepoints` will work here.
>>
>>(string-length "πŸ‘¨β€πŸ­") ;; => 3
>>(string-length "é") ;; => 2
>>
>>The number of characters here is 1 in both cases.
>
> No, in Unicode (and Guile equates character=Unicode character) all characters 
> correspond to a single codepoint.
>
> You need to fix your setup, that’s not what Guile does. Are you sure you have 
> set the encoding of current-input-port correctly? (Probably by setting LC_ALL 
> or the like to a UTF-8 locale.) Otherwise the 3 bytes in the UTF-8 encoding 
> might be interpreted in terms of some 8-bit encoding.
>
> Here’s a test: if you can input #\πŸ‘¨β€πŸ­ without errors and it evaluates to 
> #\πŸ‘¨β€πŸ­, then the encoding should be set up correctly.

(setlocale LC_ALL) ;; => "en_US.utf8"
(display #\πŸ‘¨β€πŸ­) ;; => /home/bob/guile-ares-rs/dev/guile/tmp.scm:84:15: unknown 
character name πŸ‘¨β€πŸ­

The same hapenning if I do it in usual REPL: 
LC_ALL=en_US.utf8 guile

-- 
Best regards,
Andrew Tropin

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]