freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] About Unicode


From: Henri Chorand
Subject: [Freecats-Dev] About Unicode
Date: Thu, 13 Feb 2003 09:55:09 +0100

Hi all,

Sooner or later, we'll have to learn more (well, more than what I actually
know) about Unicode.

A brief look at http://www.unicode.org/ convinced me brief is not enough.
The two-level FAQ (at http://www.unicode.org/faq/utf_bom.html) seems very
interesting.

For those with some spare time still, the reference book is freely available
online at:
http://www.unicode.org/uni2book/u2.html

A possible source of concern with Unicode is, there are just so many
flavors, as seen in the FAQ:
> Which do I need to be able to use from:
> UTF8, UTF16, UTF16LE, UTF16BE, UTF32,
> UTF32LE, UTF32BE?

Things seem to get worse when you read the answer:
> Hard to say. UTF-8 will be most common on the web.
> UTF16, UTF16LE, UTF16BE are used by Java and
> Windows.
> UTF32, UTF32LE, UTF32BE are used by various Unix
> systems.
> Luckily, the conversions between all of them are
> algorithmically based and fast.

And for the curious folks who want to experiment, you may use Windows 2000 /
XP notepad in order to use one of following save options for text files:
- ANSI
- Unicode
- Unicode big endian
- UTF-8

Well, as usual, if somebody happens to know Unicode well enough to provide a
few directions, please <shout mode on>DO SO !</shout mode off>

In a nutshell, what we need to know is:
- little endian/big endian issues between Macs, Windows PC & Unix boxes
(Linux/BSD PC for a start)
- how Python defaults on these (it would be handy if the language knows how
to manage these issues)
- "preferred" encodings within the above (I guess, one in which character
length does not vary)

A "typically optimist" extract:
> Hybrid systems in which UTF-16 is used as a disk storage
> format but expanding to UTF-32 in memory is also a
> popular solution combining small long term storage space
> with ease of processing.

Had this stuff been designed with ease of use in mind... ;-)

Anyway, if it's too difficult to master, we may begin with a Windows ANSI
version.

Let me know your thoughts.


Regards,

Henri





reply via email to

[Prev in Thread] Current Thread [Next in Thread]