[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Koha-devel] Biblio.pm Choices IMPORTANT
From: |
Joshua Ferraro |
Subject: |
Re: [Koha-devel] Biblio.pm Choices IMPORTANT |
Date: |
Wed, 24 May 2006 13:00:57 -0700 |
User-agent: |
Mutt/1.4.1i |
Hi Tumer,
Thanks for your speedy response ...
On Wed, May 24, 2006 at 06:37:12PM +0300, Tümer Garip wrote:
>1-10
Thanks!
> 11-Regarding char_decode: I already have Expat installed and still could
> not get M:F:X to work properly. But I also think that we should get rid
> of it otherwise its always going to be a problem as new formats or
> languages are added to KOHA. I also stroongly believe that all data
> within KOHA & ZEBRA should be UTF-8 and non UTF-8 handling should only
> be allowed when importing or exporting of MARC records
I've tried running some of your records through a 'roundtrip' script that
tests both as_xml and new_from_xml and their encoding/re-encoding abilities
... and they seem to work fine for me.
Here's the script:
http://liblime.com/public/roundtrip.pl
and here are the records I tested:
http://liblime.com/public/turkish.iso2709
Run it like this:
$./roundtrip.pl turkish.iso2709 turkish.utf8.iso2709 error.xml error.mrc
If it throws an error it will dump out the offending record in MARCXML and in
MARC format. Let me know how it works for you.
I completely agree that all data within Koha 3.0 should be stored as
UTF-8 and that everything that's not UTF-8 should be handled at
import or export. However, currently, the only module that handles
MARC-8 encoding properly is MARC::Charset. Since 90% of all MARC21
records are in MARC-8, it's very important that we handle it properly.
Also, since MARC-8 and UTF-8 are the only valid encodings for MARC21,
and since the MARC::Record suite already handles records in these
encodings correctly, I think we should use it. Up till now, Koha has
not been managing MARC-8 encoded records at all! Now, as to what to do
for non-MARC21 records, I think the solution is to expand the capabilities
of the MARC::Record suite to include support for UNIMARC encodings, etc.
Some work has already been done on this ...
Now, there's another matter of what to do with other encodings within
MARC21 ... because obviously there are records out there in MARC21 format
that have been incorrectly encoded. In my opinion, we should expand
the MARC::Record suite to include encoding management in a 'from -> to'
fashion, so that it's possible to re-encode incorrectly encoded records
as UTF-8.
The key in all of this IMO is to keep encoding management out of the ILS.
Isn't that the whole point of using external toolkits? Let me know what
you think.
Cheers,
--
Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE
President, Technology migration, training, maintenance, support
LibLime Featuring Koha Open-Source ILS
address@hidden |Full Demos at http://liblime.com/koha |1(888)KohaILS
>
> Cheers
> Tumer
>
>
>
> -----Original Message-----
> From: address@hidden
> [mailto:address@hidden On Behalf Of
> Joshua Ferraro
> Sent: Wednesday, May 24, 2006 3:38 PM
> To: address@hidden
> Subject: [Koha-devel] Biblio.pm Choices IMPORTANT
>
>
> Hi folks,
>
> I've been looking at merging Biblio.pm from dev-week and HEAD and it
> looks like there are a few choices that need to be made ... I'd like to
> get your feedback on these.
>
> (BTW: Tumer, go ahead and commit all the Biblio.pm stuff you've got to
> HEAD, we'll deal with the conflicts after the fact ...)
>
> 1. in HEAD we have 'z3859_extended_services', a routine capable of
> handling all the Z3859 extended services offered by Zebra, as well as
> 'set_service_options', which sets the service options before calling
> z3859_extended_services. But in dev_week, it looks like
> that's handled with 'zebraop' and when zebraop fails it falls back on
> zebraopfiles ...
>
> So which of these routes shall we take, perhaps a combo of both
> approaches?
>
> 2. in HEAD we have no more bibid, is that going to cause any strange
> conflicts in dev_week code (didn't spot anything specific).
>
> 3. in HEAD we have 3 types of subs, REALXXX, NEWXXX, something_elseXXX
> ... can't we just have one type of sub? Any reason we can't just
> completely clean up this API once and for all?
>
> 4. char_decode (you knew it was coming :-). Tumer, can you check the
> mail I sent earlier about encoding and see if installing XML::SAX::Expat
> solves your problems with MARC::File::XML's handling of encoding? If we
> have to stick with char_decode, then so be it, but I'd really like to
> have a more standards-compliant way to handle encoding in both MARC21
> and UNIMARC. What this means is that in addition to managing the proper
> encodings (to and from MARC-8 in MARC21, and to and from the appropriate
> encodings in UNIMARC), Koha should also handle the leader (for MARC21)
> and tag 100 (for UNIMARC), and properly set the encoding when all is
> said and done. Internally, I vote that we store everything as UTF-8 in
> all
> circumstances.
>
> That's all for now ..
>
> Cheers,
>
> --
> Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE
> President, Technology migration, training, maintenance, support
> LibLime Featuring Koha Open-Source ILS
> address@hidden |Full Demos at http://liblime.com/koha |1(888)KohaILS