[Koha-devel] utf-8, probable solution

koha-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Koha-devel] utf-8, probable solution

From:	Paul POULAIN
Subject:	[Koha-devel] utf-8, probable solution
Date:	Wed, 15 Feb 2006 16:08:32 +0100
User-agent:	Mozilla Thunderbird 1.0.6-7.2.20060mdk (X11/20050322)

Thanks to Heikki Levanto, Tümer Garip & Mike Rylander, you pointed 3things useless alone, but very useful when mixed.

I think I have the solution to our problem. It's not a zebra orhtml::template or marc::record problem, it's a Perl one !


Let me explain :

I followed my utf-8 string in my perl Code until printed and it wasalways utf-8 (\x9c...)

But in firefox, it was iso8859-1.

Heikki told me that the first 255 char were shared by unicode andiso8859-1. So, I told myself : OK, Paul, add a "true utf-8 character toyour string". I choose \x{263a} (the smiley, because i'm alwaysoptimistic & that is what is used in perluniintro)


Surprise ... now my é was a utf-8 é in firefox !!!!

Conclusion : perl looked at my string before sendint it, and, as itfinds it's not "true utf-8", Perl did something to change it in iso8859-1.


I also had a brand new message in my log :
>            Wide character in print at ...

Mike R. and Tümer G. suggestions make me investigate perldoc on unicode.
and here it is :

       A user of Perl does not normally need to know nor care how Perl happens 
to encode its internal strings, but it becomes rele-
       vant when outputting Unicode strings to a stream without a PerlIO layer -- one 
with the "default" encoding.  In such a case,
       the raw bytes used internally (the native character set or UTF-8, as 
appropriate for each string) will be used, and a "Wide
       character" warning will be issued if those strings contain a character 
beyond 0x00FF.
       For example,
             perl -e 'print "\x{DF}\n", "\x{0100}\x{DF}\n"'
       produces a fairly useless mixture of native bytes and UTF-8, as well as 
a warning:
            Wide character in print at ...
       To output UTF-8, use the ":utf8" output layer.  Prepending
             binmode(STDOUT, ":utf8");
       to this sample program ensures that the output is completely UTF-8, and 
removes the program's warning.

GOTCHA ! I have added binmode(STDOUT, ":utf8"), and now, even withoutthe smiley, my éà... are correctly shown.


Still having to investigate mySQL utf-8, but it seems that
> set NAMES=utf8
is useless.

Thanks everybody for helping me. I'll continue this thread on koha-develonly, as zebra & perl4lib are not interested probably.

--
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)

[Prev in Thread]

Current Thread

[Next in Thread]

[Koha-devel] utf-8, probable solution, Paul POULAIN <=
- [Koha-devel] Re: [Zebralist] utf-8, probable solution, Sebastian Hammer, 2006/02/15
  - Re: [Koha-devel] Re: [Zebralist] utf-8, probable solution, Paul POULAIN, 2006/02/17

Prev by Date: [Koha-devel] Re: [Zebralist] utf-8, probable solution
Next by Date: [Koha-devel] yahoo tools
Previous by thread: [Koha-devel] Koha 3.0 Meeting
Next by thread: [Koha-devel] Re: [Zebralist] utf-8, probable solution
Index(es):
- Date
- Thread