koha-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-devel] Re: MARC character encoding


From: paul POULAIN
Subject: Re: [Koha-devel] Re: MARC character encoding
Date: Thu Jan 23 04:26:06 2003
User-agent: Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.1) Gecko/20020826

Ed Summers a écrit:
On Tue, Jan 21, 2003 at 09:15:07AM +0100, paul POULAIN wrote:
  
Francois lemarchand sended me a little script to translate éà... into 
8859-1 standard characters. I've included it in the addbiblio.pl script 
(when the system finds a biblio in the breeding farm)
It seems to work. Things are definetly strange in char encoding.

Uploaded in cvs a few minuts ago
    

I'm looking at the script. From the comments it looks like Francois'
code is converting from ISO 5426 to ISO 8859-1. How are character sets
handled in UNIMARC? I'm guessing there are more character sets than ISO
5426 which can be used.

I just checked and Perl's Encode::* modules don't seem to handle ISO 5426 :( 
which is a shame. It is even more a shame that ISO doesn't make these standards public. I'm going to subscribe to address@hidden and see if I can find 
out more.

//Ed
Sorry, but i've more deeply looked at francois code, and some MARC21 and UNIMARC files.

My conclusion is that the following code :
        s/\xe1/\xc1/gm;
        s/\xe2/\xc2/gm;
        s/\xe3/\xc3/gm;
        s/\xe4/\xc4/gm;
        s/\xe8/\xc8/gm;
        s/\xe9/\xc9/gm;
        s/\xf0/\xd0/gm;
is enough to migrate from MARC21 to UNIMARC char coding. It tried this on my marc21->unimarc script, on 30 000 records, and it works fine.

So, i think we have 2 complete tables (marc21 and unimarc) in Biblio.pm, that i commited a few minuts ago.
-- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]