lilypond-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 in MIDI Lyrics


From: karl
Subject: Re: UTF-8 in MIDI Lyrics
Date: Sat, 25 Feb 2017 17:44:32 +0100 (CET)

sorry, last mail wrong from header.

Joe Austen:
> > Am 24.02.2017 um 02:15 schrieb Joseph Austin:
> >> This raises another question.  I'm working with MIDI files,
> >> and it's not clear how to encode UTF-8 text in MIDI.
> >> There must be some convention, but I haven't found an official RP for it.
...
> I don't have a program that displays MIDI  files with lyrics, so I can't test 
> it.

Timidity will show the lyrics.
I have a simple program that dumps the midi as text:

 http://aspodata.se/git/musik/bin/midi.pl

$ midi.pl test.midi  | grep lyric | head
        ['lyric', 0, 'Sta'],
        ['lyric', 768, 'bat '],
        ['lyric', 768, 'Ma'],
        ['lyric', 768, 'ter '],
        ['lyric', 384, 'do'],
        ['lyric', 768, 'lo'],
        ['lyric', 384, 'ro'],
        ['lyric', 768, 'sa '],
        ['lyric', 384, 'sa '],
        ['lyric', 384, 'jux'],
$

> It appears that, when generating a MIDI file, LilyPond currently
> just puts UTF8 chars in the text fields as if they were ASCII.
> According the base MIDI spec, this is illegal;  only ASCII chars
> between 0 and 127 are allowed.

Your wording is too hard. complete_midi_96-1-3.pdf, p.137 (or [1] 
p.10) clearly says "should", but 

 "other characters codes
 using the high-order bit may be used for interchange of files between
 different programs on the same computer which supports an extended
 character set. Programs on a computer       which  does not support
 non-ASCII characters should ignore those characters."

[1] http://www.cdik.se/pdf/midiformat.pdf

Also, rp17.pdf, last paragraph gives you the set that are "accepted for use"
and that "it is best to avoid the use of these characters: \ [ ] { }".

And,  rp26 clearly states in section 5:
 
 In addition, if a byte order mark which specifies UNICODE such as
 'FF FE' or 'FE FF' exists, the character code SET should be treated
  as UNICODE.

There is such a "byte order mark" for utf8, see [2]. And then by
extension, you just have to insert that BOM somewhere in the midi
file (exists == not restricted to the lyrics meta event, preferable
in track 0 at time 0) and it would be legal (according to the
recommendation) to use utf8 straigth out the box.

[2] http://www.unicode.org/faq/utf_bom.html#BOM

> However, MIDI RP-17 and RP-26 introduce additional encodings for
> the <text> portion of the lyric meta-event FF 05 <len> <text>.

You do extrapolate a litte, rp17 tells you the "recommended" way to 
specify end of word/line/paragraph, and gives you a list of characters
that should give no compatibility problems.

> In particular, RP-26 specifies the "language" code  address@hidden to
> include 8-bit chars > 127.  It seems no code for "UTF8" has been
> officially defined, but a reasonable proposal might be language code:
> address@hidden

You don't need that, see above about BOM. Also it would be interesting
to see which programs that actually support rp26. Since midi "standards"
just are recommendations, you have to know what works in the wild.

..
> So for LilyPond purposes, it would suffice to use a reversible
> encoding, that is, LilyPond would accept any MIDI file text format
> that LilyPond generates.  The apparently existing UTF-8 default
> should work for that.

Lilypond don't read midi files, you can convert midi files to ly files,
which then lilypond can read.

> But if we are going to use a "private standard", we might as well
> imitate the "official" standard and insert something like
> FF 05 07 { @ U T F 8 }
> And lobby AMEI/MMA to adopt an official UTF8 position.

Could be good, but why just not capitalize on the BOM and just use
utf8.

Regards,
/Karl Hammar

-----------------------------------------------------------------------
Aspö Data
Lilla Aspö 148
S-742 94 Östhammar
Sweden
+46 173 140 57



reply via email to

[Prev in Thread] Current Thread [Next in Thread]