openexr-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Openexr-devel] UNICODE support in openexr file I/O


From: Yves Poissant
Subject: Re: [Openexr-devel] UNICODE support in openexr file I/O
Date: Thu, 19 Jan 2006 22:12:41 -0800

All ASCII character codes (meaning codes lower than 0x7F) are exactly the 
same in UTF-8 so if a string is ASCII, it doesn't matter if the application 
thinks it is UTF-8 or ASCII because they are the same. For international 
character sets, using UTF-8 (or Unicode in general) is anyway far more 
robust and trouble free than trying to support all the possible encodings 
around.

The specifications could say that any UTF-8 encoded string should start with 
the Unicode BOM (Byte Order Mark) which is 2 special 8bits characters that 
when found at the begining of a string signifies that the string is Unicode. 
This way, there would be no confusion. Also, adding two API to convert from 
UTF-8 to UTF-16 (what is usually refered as Unicode) could be done using the 
C conversion samples that can be found on www.unicode.org.

Any furter translations, to or from some native OS encoding for instance, 
would need to be done on the host OS. As far as I know, every major OS in 
existence today have an extensive set of string encoding convertions which 
includes converting UTF-8 so I don't really think there is a real need for 
that sort of conversion in the OpenEXR API. In fact, I even think it would 
be detrimental to anyone who don't want to bother with Unicode but just 
wants to store and read ASCII characters. Those users would have to convert 
from UTF-16 to ASCII while if there are not preconvertion to and from 
UTF-16, the string could just be used, as is, as ASCII strings.

Yves Poissant
www.hash.com

----- Original Message ----- 
From: "Bob Friesenhahn" <address@hidden>
To: "Florian Kainz" <address@hidden>
Cc: <address@hidden>
Sent: Thursday, January 19, 2006 8:31 PM
Subject: Re: [Openexr-devel] UNICODE support in openexr file I/O


On Thu, 19 Jan 2006, Florian Kainz wrote:

> The UTF-8 strings are the Thai name for the city of Bangkok
> and the German word for "twelve".

You provided excellent examples.  I think that just like the way my
mail program alerted me to a codepage difference, OpenEXR should
provide a way for the application to specify the character set which
was used (which could include UTF-8), and a way for an application to
see which character set was used, so that the characters can be
appropriately transcoded by the application.  The main requirement is
that the storage must work with simple strings.

All modern operating systems provide APIs for translating between
character sets so OpenEXR's responsibility should go as far as storing
the string, and the identification of its character set (which should
have a default like iso-8859-1 which works for most users).  If the
application does not specify the character set which is used, then
operation would be the same as today.

Bob

>
> Bob Friesenhahn wrote:
>> This is what my mail program had to say about your email:
>>
>>         [ The following text is in the "UTF-8" character set. ]
>>         [ Your display is set for the "iso-8859-1" character set.  ]
>>         [ Some characters may be displayed incorrectly. ]
>>
>> So it seems that we may already have an interoperabilty problem. :-)
>>
>> Bob
>> ======================================
>> Bob Friesenhahn
>> address@hidden, 
>> http://www.simplesystems.org/users/bfriesen/
>> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>>
>>
>

======================================
Bob Friesenhahn
address@hidden, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/


_______________________________________________
Openexr-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/openexr-devel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]