lilypond-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How can I avoid unicode and use Latin1?


From: stk
Subject: Re: How can I avoid unicode and use Latin1?
Date: Mon, 5 Sep 2005 02:58:41 -0400 (EDT)

On Sun, 4 Sep 2005, Werner LEMBERG wrote:

> You are mixing up Unicode with one of its possible representations,
> UTF-8.  A Unicode character is a number between 0x0 and 0x10FFFF;
> UTF-8 represents such code points as multi-byte sequences of varying
> length, where the range 0x00-0x7F is identical to ASCII.

Thank you. I didn't know unicode was broader than UTF-8.  The 3-byte value
10FFFF (rather than FFFFFF) seems like a rather strange upper limit, but
that only points up the fact that I'm going to have to learn about unicode
once I get through my current arranging binge.

> Today, Windows uses Unicode exclusively -- even in North America.  You
> won't have big success with latin1 files.

I routinely switch files between Latin1 text and MS-Word docs with no
problem whatsoever.  When one saves a file in Word selecting the type Text
or Text With Line Breaks, one gets a Latin1 file -- and I have verified
these text files (put out by Word) directly with a hex editor: e-acute,
a-grave, etc. are all represented by a single byte, and it is the standard
Latin1 byte. As far back as Word 97, Microsoft claimed that Word and its
Visual Basic ("VBA") used unicode "internally".  But if one looks at a
Word .doc file with a hex editor, one sees that, in the file, all the
French accented characters are stored as single-byte standard Latin1
codes.  Microsoft's unicode claims are a marketing ploy; Latin1 still
rules.

> Well, it is straightforward to use a converter like `iconv' within a
> script which automatically transforms your latin1 file into UTF-8.

Yet another converter.  Well, it's good to know that.  But for the moment
I encounter accented letters only in song titles (I use no lyrics), so
typing in the UTF-8 double-byte for the rare accented character here and
there takes about 3 seconds, which is easy.

Thank you for taking the trouble to send me the information on unicode &
UTF-8.

-- Tom





reply via email to

[Prev in Thread] Current Thread [Next in Thread]