[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/3] Make string-length documentation more correct
From: |
Jean Abou Samra |
Subject: |
Re: [PATCH 1/3] Make string-length documentation more correct |
Date: |
Wed, 26 Jun 2024 14:18:16 +0200 |
User-agent: |
Evolution 3.52.1 (3.52.1-1.fc40) |
Le mercredi 26 juin 2024 à 13:46 +0200, Maxime Devos a écrit :
> >
> > Maybe `the number of codepoints` will work here.
> > (string-length "👨🏭") ;; => 3
> > (string-length "é") ;; => 2>
> > The number of characters here is 1 in both cases.
>
> No, in Unicode (and Guile equates character=Unicode character) all
> characters correspond to a single codepoint.
Agreed. "The number of code points" would be correct, but "the number
of characters" (i.e., the current wording) is correct too. In the
Scheme terminology, a character is just a Unicode code point,
as can be seen from the name of the procedure character? and related
APIs.
> You need to fix your setup, that’s not what Guile does.
No; he wrote é, U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT,
which is two characters unlike é, LATIN SMALL LETTER E WITH ACUTE.
Likewise 👨🏭 is U+1F468 MAN + U+200D ZERO WIDTH JOINER + U+1F3ED FACTORY.
The "visual characters" are called grapheme clusters, and AFAIK Guile
doesn't provide any API that relates to grapheme clusters. (Note that
the number of grapheme clusters in a given strings depends on the Unicode
database and therefore on the Unicode version.)
There are programming languages where the data type called "character"
corresponds to grapheme clusters, but I don't think this is common.
Swift is the only example I know.
Obligatory reading: https://hsivonen.fi/string-length/
signature.asc
Description: This is a digitally signed message part
[PATCH 2/3] Change make-dynamic-state mentions to current-dynamic-state, Andrew Tropin, 2024/06/25
[PATCH 3/3] Fix spelling, Andrew Tropin, 2024/06/25