koha-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-devel] Catalogue API (fwd)


From: Alan Millar
Subject: Re: [Koha-devel] Catalogue API (fwd)
Date: Tue May 21 23:33:03 2002
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408

Hi Steve- Sorry I missed the IRC chat. However, the IRC web logs are great! Glad I could catch up at least after the fact. You've written an excellent summary here. I'm beginning to see the light :-) and I think Pat's IRC comment is correct that we do have more in common in view here than it may have first appeared.

I'm hopeful that we can develop a cataloguing API that will alleviate some
of these problems.

getinfo(1563,'author')  returns Herman Melville  from the 100 a subfield
setinfo(1563,'author', 'Melville, Herman') updates author info

Yes, this will go a long way towards simplifying it for the random contributing developer like myself. I can imagine that such a map could be database driven, with a table of values along the lines of:

 author,100,a
 title,245,a
 isbn,020,a

and so on. Perhaps even multiple MARC flavors could be handled that way with no (or minimal) coding changes.

1.  acqui.simple and marcimport.pl == MARC compatible

  acqui.simple and marcimport.pl will import marc records and "squash" the
  data into the standard Koha database.  There can be considerable loss of
information in this process.

Right. It would only be viable if combined with the next item, Marc tables for non-kohadb data:

2.  Use Marc tables only for data that is not already represented by the
    koha databases.

  I think this is a bad idea.  It'd be too confusing about what
  information was stored where.

Perhaps it is. However, it might work if we adopted some sort of table-driven field structure like the above, such as

   title = 100a = kohadb:biblio.title
   author = 245a = kohadb:additionalauthors.name
   lccall = 050a = marc

  I envision separate indexes being
  generated to facilitate searching.  It would be easy, for example, to
  create an index that contained the following information:

      bibid, title, author, dewey, isbn

  Then a search for bibid would be identical to the search that you gave
  in your example using the koha database.

OK, I think this is the piece that I was missing from the picture. That would allow simple queries like I'm looking for.

 Not [Note?] that you could potentially
  lose information stored in the marc record if there is more than one
  author, for example.

Or, there could also be another index table that paralleled the current "additionalauthors" table.

>  These "seach indexes" are still vaporware.  Paul
>   and I are trying to work out the specifics of what the new backend
>   database will look like.

My suggestion: what about using the existing structure of the KohaDB (perhaps with minor mods) for these search index tables? Code transition would happen by first isolating all the writes to the existing tables and making them go through API's. Then any existing code that is just reading the data wouldn't have to change.

5.  You mention MARC-XML and suggest that we not tie ourselves to the MARC
    format.

  We are not proposing that bibliographic data be stored in MARC format,
  only that the back end database be capable of storing any and all
  information that a MARC record can store (whether it is a conventional
  MARC record or the as-yet-unspecified MARC-XML format record).  The
  existing MARC format is just a flat file that uses control character
  separators and a directory index for the tags.  The MARC-XML
  specification will be the same format, but will use XML tags for
  separating the data instead of control characters.  We would not want to
  use either of these formats internally, although we would probably want
  to be capable of importing/exporting those formats.

I definitely agree here. I'm just now understanding that file format with the tag index and control character separators. It seems to me that the other (original?) Marc representation can be considered yet a third format. I'm thinking of ones I see where the tag number is at the beginning of each line and the $ character separates the subfields. I'm presuming that this is what Marc-familiar librarians work on, and then it is translated to be stored in the control-character file format (or something else like the database we are discussing).

Based on this idea of the abstraction of the Marc data being the important part, where "100a = Author" regardless of storage, then I really like your suggestion on IRC of

> The schema has columns like bibid,tag,subfieldcode,subfieldvalue, so...
> (bibid, tag, subfieldcode, subfieldvalue) = (16654, 165, 'a', 'http://www.mysite.com/')

In my opinion, then, I would vote against the other suggestion of putting all subfields into one record separated by "$" characters. It appears to be an artifact of the "straight-text" Marc format that doesn't exist in the control-character file format or proposed XML format, and therefore isn't really part of the "true" abstracted data.

It also increases the complexity of getting at any one subfield and potentially restricts the types of languages or tool sets that could work on the data. I love perl and regexps but I don't think we want a database that excludes non-perl access by design.

And relational database engines are built by design to support data growth by adding rows. Perhaps the list of additional authors won't ever be that big for any book I know of, but it seems short-sighted to me to design it so an arbitrary-length list of subfields would be limited by column width or db-engine support for non-conventional data types just to handle wider and wider text strings.

Thanks for the comprehensive view; I'm starting to get the picture :-)

- Alan


--
Alan Millar                  Email: address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]