Re: [Koha-devel] Re: MARC character encoding

koha-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-devel] Re: MARC character encoding

From:	paul POULAIN
Subject:	Re: [Koha-devel] Re: MARC character encoding
Date:	Mon Jan 20 08:46:02 2003
User-agent:	Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.1) Gecko/20020826

Ed Summers a écrit:

Hi Paul:
On Wed, Jan 15, 2003 at 03:20:47PM +0100, paul POULAIN wrote:

Could someone explain how to translate the "MARC21" charset to a more 
convenient one (and which is more convenient ?)
Same question for UNIMARC (which is ISO646 if my docs are right)

If we lived in a perfect world we would all be using Unicode (UTF8)
since it covers so many of the worlds scripts [1]. Unfortunately the
world is not perfect. MARC has been around longer than Unicode, so 
MARC-8 character encoding to allow non Latin scripts to live in MARC 
records. I guess the world has bigger problems than character encodings
(Mr George Bush comes to mind), but I'll leave that particular problem
alone :)

50% of your news here are related to Mr George Bush.
Unfortunately for me, the other 50% are NOT related to character encoding :-)))

I wasn't aware that UNIMARC had defined a different standard for
character encoding. Isn't ISO646 just an synonym for ASCII? [2] Which docs 
describe the character sets used in UNIMARC?

No, you're right. ISO646 IS Ascii.
What i don't understand is how they code >127 codes on 2 digits.
for example, \xc3\x65 = ê
It's not ASCII ?

I tried MARC-Charset, which seems to translate from "MARC21" to UNICODE, 
but i don't know what to do with my unicode ;-(

Yes, MARC::Charset is an implementation of the MARC-8 ==> Unicode
(UTF-8) mappings published by the Library of Congress. [3] In MARC-8
there is a special way of 'escaping' to other character sets (Hebrew,
Cyrillic, East-Asian, etc).

this special way is \xc3\x65 ?

You mapped all the UNIMARC fields to MARC fields!?! I was under the
impression that this was quite a big undertaking to do completely. Is
your code currently checked into CVS? Having a UNIMARC filter in
MARC::Record (MARC::File::UNIMARC) has been a long term goal. Maybe we
could roll this work into the MARC::Record package?

NO, of course. You're right, this is a BIG job.
I did this only for a few fields/subfields I needed (around 20-25 fields)
It's a 10-20 lines script (+ the mapping array) with MARC::Record

- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)

[Prev in Thread]

Current Thread

[Next in Thread]

[Koha-devel] MARC character encoding, paul POULAIN, 2003/01/15
- [Koha-devel] Re: MARC character encoding, Ed Summers, 2003/01/15
  - [Koha-devel] Koha backend's character encoding (was: Re: MARC character encoding), address@hidden via news-to-mail gateway, 2003/01/15
  - Re: [Koha-devel] Re: MARC character encoding, paul POULAIN <=
    - Re: [Koha-devel] Re: MARC character encoding, Ed Summers, 2003/01/20
    - Re: [Koha-devel] Re: MARC character encoding, paul POULAIN, 2003/01/21
    - Re: [Koha-devel] Re: MARC character encoding, Ed Summers, 2003/01/22
    - Re: [Koha-devel] Re: MARC character encoding, paul POULAIN, 2003/01/23
    - Re: [Koha-devel] Re: MARC character encoding, Ed Summers, 2003/01/23
  - Re: [Koha-devel] Re: MARC character encoding, paul POULAIN, 2003/01/22
    - Re: [Koha-devel] Re: MARC character encoding, Ed Summers, 2003/01/22
    - Re: [Koha-devel] Re: MARC character encoding, paul POULAIN, 2003/01/23

Prev by Date: Re: [Koha-devel] Danish translation
Next by Date: [Koha-devel] [Bug 118] searching
Previous by thread: [Koha-devel] Koha backend's character encoding (was: Re: MARC character encoding)
Next by thread: Re: [Koha-devel] Re: MARC character encoding
Index(es):
- Date
- Thread