|
From: | Urs Liska |
Subject: | Re: Character encoding / poor man's letterspacing |
Date: | Tue, 12 Mar 2019 10:43:23 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 |
Am 12.03.19 um 01:14 schrieb Aaron Hill:
On 2019-03-11 3:40 pm, David Kastrup wrote:Urs Liska <address@hidden> writes:Am 11.03.19 um 20:22 schrieb Aaron Hill:On 2019-03-11 11:30 am, David Kastrup wrote:Urs Liska <address@hidden> writes:Hi, I've written a poor-man's implementation of a simple \letterspaced markup command: #(define-markup-command (letterspaced layout props text)(markup?) (let* ((chars (string->list text)) (dummy (ly:message "Chars: ~a" chars)) (spaced-text (string-join (map string chars) " "))) (interpret-markup layout props (markup spaced-text))))However, this scrambles umlauts and presumably other UTF-8 charactersas you can see with { s1 ^\markup \letterspaced "Täst" } =>Chars: (T � � s t) Obviously the characters are wrongly en/decoded along the way, which makes me think whether I have simply forgotten an encoding setting somewhere (although I have no idea where and how I should include that) or whether that whole routine is totally clumsy. Any pointer would be appreciated.Guile-1.8 has only byte strings, not Unicode character strings. However, the regexp procedures are locale aware, so you can use something like/./ isn't smart enough to match Unicode graphemes. You would need /\X/, however that is not supported in POSIX ERE. Neither is the approximation /\P{M}\p{M}*+/.I can confirm that the suggestion doesn't work for me, even with the given example. It's still "T s t" (see attached).Do you have an UTF-8 locale set?That's because the file you attached was not in UTF-8. I was able to open it using ISO 8859-1. In UTF-8, the 0xe4 for ä becomes U+FFFD (Replacement Character). It should have been encoded as 0xc3 0xa4. Either fixing the encoding or just retyping the umlaut A results in a successful result.
Not with me. Also when injecting David's procedure in my actual project it doesn't seem to work. And I assume you are *not* talking about the encoding of the procedure definition?
Also, I should have been clear before. David's code should work for most cases. I was just being pedantic that /./ would not work if the input has combining characters. For instance, if you type U+0308 (Combining Diaeresis) after an 'a', you'll get an ä. But the simple regex would not treat that as a single grapheme. The result would be "T a ̈ s t".
I did understand it that way, and it would not be an issue in the project I'm working on. There it's just some umlauts.
Urs
-- Aaron Hill _______________________________________________ lilypond-user mailing list address@hidden https://lists.gnu.org/mailman/listinfo/lilypond-user
[Prev in Thread] | Current Thread | [Next in Thread] |