guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/3] Make string-length documentation more correct


From: Jean Abou Samra
Subject: Re: [PATCH 1/3] Make string-length documentation more correct
Date: Wed, 26 Jun 2024 14:18:16 +0200
User-agent: Evolution 3.52.1 (3.52.1-1.fc40)

Le mercredi 26 juin 2024 à 13:46 +0200, Maxime Devos a écrit :
> > 
> > Maybe `the number of codepoints` will work here.
> > (string-length "👨‍🏭") ;; => 3
> > (string-length "é") ;; => 2> 
> > The number of characters here is 1 in both cases.
> 
> No, in Unicode (and Guile equates character=Unicode character) all
> characters correspond to a single codepoint.


Agreed. "The number of code points" would be correct, but "the number
of characters" (i.e., the current wording) is correct too. In the
Scheme terminology, a character is just a Unicode code point,
as can be seen from the name of the procedure character? and related
APIs.


> You need to fix your setup, that’s not what Guile does.


No; he wrote é, U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT,
which is two characters unlike é, LATIN SMALL LETTER E WITH ACUTE.

Likewise 👨‍🏭 is U+1F468 MAN + U+200D ZERO WIDTH JOINER + U+1F3ED FACTORY.

The "visual characters" are called grapheme clusters, and AFAIK Guile
doesn't provide any API that relates to grapheme clusters. (Note that
the number of grapheme clusters in a given strings depends on the Unicode
database and therefore on the Unicode version.)

There are programming languages where the data type called "character"
corresponds to grapheme clusters, but I don't think this is common.
Swift is the only example I know.

Obligatory reading: https://hsivonen.fi/string-length/


Attachment: signature.asc
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]