koha-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Koha-devel] utf-8, probable solution


From: Paul POULAIN
Subject: [Koha-devel] utf-8, probable solution
Date: Wed, 15 Feb 2006 16:08:32 +0100
User-agent: Mozilla Thunderbird 1.0.6-7.2.20060mdk (X11/20050322)

Thanks to Heikki Levanto, Tümer Garip & Mike Rylander, you pointed 3 things useless alone, but very useful when mixed.

I think I have the solution to our problem. It's not a zebra or html::template or marc::record problem, it's a Perl one !

Let me explain :
I followed my utf-8 string in my perl Code until printed and it was always utf-8 (\x9c...)
But in firefox, it was iso8859-1.

Heikki told me that the first 255 char were shared by unicode and iso8859-1. So, I told myself : OK, Paul, add a "true utf-8 character to your string". I choose \x{263a} (the smiley, because i'm always optimistic & that is what is used in perluniintro)

Surprise ... now my é was a utf-8 é in firefox !!!!
Conclusion : perl looked at my string before sendint it, and, as it finds it's not "true utf-8", Perl did something to change it in iso8859-1.

I also had a brand new message in my log :
>            Wide character in print at ...

Mike R. and Tümer G. suggestions make me investigate perldoc on unicode.
and here it is :
       A user of Perl does not normally need to know nor care how Perl happens 
to encode its internal strings, but it becomes rele-
       vant when outputting Unicode strings to a stream without a PerlIO layer -- one 
with the "default" encoding.  In such a case,
       the raw bytes used internally (the native character set or UTF-8, as 
appropriate for each string) will be used, and a "Wide
       character" warning will be issued if those strings contain a character 
beyond 0x00FF.
       For example,
             perl -e 'print "\x{DF}\n", "\x{0100}\x{DF}\n"'
       produces a fairly useless mixture of native bytes and UTF-8, as well as 
a warning:
            Wide character in print at ...
       To output UTF-8, use the ":utf8" output layer.  Prepending
             binmode(STDOUT, ":utf8");
       to this sample program ensures that the output is completely UTF-8, and 
removes the program's warning.


GOTCHA ! I have added binmode(STDOUT, ":utf8"), and now, even without the smiley, my éà... are correctly shown.

Still having to investigate mySQL utf-8, but it seems that
> set NAMES=utf8
is useless.

Thanks everybody for helping me. I'll continue this thread on koha-devel only, as zebra & perl4lib are not interested probably.
--
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]