koha-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Koha-devel] Re: [Zebralist] utf-8, probable solution


From: Sebastian Hammer
Subject: [Koha-devel] Re: [Zebralist] utf-8, probable solution
Date: Wed, 15 Feb 2006 10:17:08 -0500
User-agent: Mozilla Thunderbird 1.0.7 (Macintosh/20050923)

Paul,

this confirms our impressions at Index Data.. somehow, while PHP has managed to approach Unicode in a way that mostly 'just works' (probably by not doing more than necessary to it), Perl seems to have all kinds of internal logic which has the effect of making Unicode really, really complicated and unintuitive. We had two guys spending a week or so each trying to make heads or tails of the UTF-8 tutorial, and still we felt at the end like we were fudging around the problem rather than really solving it well.

I'm *not* fond of Perl's approach to Unicode.

--Sebastian

Paul POULAIN wrote:

Thanks to Heikki Levanto, Tümer Garip & Mike Rylander, you pointed 3 things useless alone, but very useful when mixed.

I think I have the solution to our problem. It's not a zebra or html::template or marc::record problem, it's a Perl one !

Let me explain :
I followed my utf-8 string in my perl Code until printed and it was always utf-8 (\x9c...)
But in firefox, it was iso8859-1.

Heikki told me that the first 255 char were shared by unicode and iso8859-1. So, I told myself : OK, Paul, add a "true utf-8 character to your string". I choose \x{263a} (the smiley, because i'm always optimistic & that is what is used in perluniintro)

Surprise ... now my é was a utf-8 é in firefox !!!!
Conclusion : perl looked at my string before sendint it, and, as it finds it's not "true utf-8", Perl did something to change it in iso8859-1.

I also had a brand new message in my log :
>            Wide character in print at ...

Mike R. and Tümer G. suggestions make me investigate perldoc on unicode.
and here it is :

A user of Perl does not normally need to know nor care how Perl happens to encode its internal strings, but it becomes rele- vant when outputting Unicode strings to a stream without a PerlIO layer -- one with the "default" encoding. In such a case, the raw bytes used internally (the native character set or UTF-8, as appropriate for each string) will be used, and a "Wide character" warning will be issued if those strings contain a character beyond 0x00FF.
       For example,
             perl -e 'print "\x{DF}\n", "\x{0100}\x{DF}\n"'
produces a fairly useless mixture of native bytes and UTF-8, as well as a warning:
            Wide character in print at ...
       To output UTF-8, use the ":utf8" output layer.  Prepending
             binmode(STDOUT, ":utf8");
to this sample program ensures that the output is completely UTF-8, and removes the program's warning.



GOTCHA ! I have added binmode(STDOUT, ":utf8"), and now, even without the smiley, my éà... are correctly shown.

Still having to investigate mySQL utf-8, but it seems that
> set NAMES=utf8
is useless.

Thanks everybody for helping me. I'll continue this thread on koha-devel only, as zebra & perl4lib are not interested probably.


--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853






reply via email to

[Prev in Thread] Current Thread [Next in Thread]