lilypond-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multi-byte characters in Lyrics


From: Maurits Lamers
Subject: Re: Multi-byte characters in Lyrics
Date: Fri, 27 Oct 2017 00:08:26 +0200


Op 26 okt. 2017, om 17:27 heeft David Kastrup <address@hidden> het volgende geschreven:

Maurits Lamers <address@hidden> writes:

Hi,

I am writing an extension to lilypond to support generating some basic
braille inside an includable .ly file.
I am trying to map the characters of lyric events into a set of braille dots.
One of the issues I have is that I have trouble finding a way to do
this with characters which seem to be multi-byte.
In this case, these character are defined in text mode as 

"’s He"

I have tried quite a few ways of simply getting 5 characters, but the
first one (which I found out through other means) has charcode 8217.
None of the functions I could find works to get this character as one
character, as it seems that even integer->char only allows values
between 0 and 255.

Characters in Guile 1.8 are bytes.  Where it the problem?

I cannot convert a multi-byte character to a symbol, unless I do some very inelegant hacks.


I have fiddled with ly:wide-char->utf-8 and ly:encode-string-for-pdf
but that doesn't bring much either.

\markup \char #5000

And it's not like

\markup #(ly:wide-char->utf-8 5000)

wouldn't work.  You just have to work with strings instead of characters.

It is not a problem on the input side, it is a problem on the processing side. 
I set up an engraver to listen to lyric events. As the lyrics have to be mapped to braille, I map every character to a specific braille dot pattern.
I have to do this in order to support braille embossers, which mostly still are ascii based and are not in agreement on which dot pattern maps to which ascii character. I also want to be able to support unicode.
So, every lyric event, I retrieve the text with (ly:event-property event 'text), which then needs to be processed into braille dots, which I achieve by doing (string->list) or could do through (string-ref str pos).
This works for almost all situations, except this one. I get a lyric which contains an inverted comma instead of a apostrophe, and literally defined as:

"’s He"

This inverted comma is a multi-byte character, but I cannot read it as a character, I can only read it as the separate bytes.
This is problematic, because as far as I know these characters could have a different meaning by themselves, as they could each can represent a different character.


Because of other limitations, it has to be compatible with Lilypond
2.14.

A really bad idea.
Couldn't agree more, but at the moment I don't have much choice, and there doesn't seem much benefit in using 2.18 as it seems to suffer from the same problem.


I have a big assoc list which contains the mapping, so I would like to
be able to perform (assoc-ref mymapping (symbol char)) to do the
lookup
What would be the best way of achieving this with multi-byte
characters?

Use strings.  assoc-ref can work with them.

This was a very good lead. With great help from the scheme IRC channel, I figured out that having strings as keys works great, and because they suggested (and provided) an UTF8 byte count counter, I was able to implement a simple function which takes as many characters from the string as required to make a proper match to the assoc list.

So, problem solved :)

cheers

Maurits




reply via email to

[Prev in Thread] Current Thread [Next in Thread]