[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Koha-zebra] Koha Zebra Searching Report (from NPL)
From: |
Joshua Ferraro |
Subject: |
[Koha-zebra] Koha Zebra Searching Report (from NPL) |
Date: |
Wed, 22 Mar 2006 15:27:57 -0800 |
User-agent: |
Mutt/1.4.1i |
Hi all,
I had a day-long session with NPL staff members today to get some
feedback on the Koha Zebra search options as they stand right now.
The demo we used is here: http://kohatest.liblime.com. It's the
entire (150K or so records from NPL's database indexed using the
record.abs that I committed to CVS yesterday and the latest
SearchMarc.pm.
PRELIMINARY COMMENTS
NPL was very pleased with the new system and we got a lot of
positive feedback. They are of course quite familiar now with
the open source process of feedback so were more than happy to
offer suggestions for how to improve.
So ... here we go ...
The advanced search should include true boolean searching as is
standard on many library advanced search interfaces (select
from multiple type of searches and have the ability to AND OR
XOR NOT).
Search by format should work (currently doesn't). Also, there is
some question as to whether 'format' differs from 'type' differs
from 'category'. We need to take a closer look at the bib1 profile
and see what is available.
Search by Branch should work -- and really, I'm not sure that
bib1 suppors a 'Branch' option so I'm not sure how that would
work ...
Search by Call # (LCC or Dewey) should work.
Search by Publisher didn't seem to be working -- it should use
at least 260$c but we need to better define where pub info is
supposed to be kept.
Sort by date should use the 008 field's dates (no other field
is normalized enough to have sort by be consistant -- unless
someone on this list knows of another field).
SPELLCHECKING
Does zebra have an integrated spellcheck feature? NPL was hoping
that zebra would use variant spelling and misspelling (via a
soundex or something) to pull up results even when the record's
form differs from the search term. If not, we should expand the
spellchecking feature LibLime wrote to be more complete.
TITLE SEARCHING
The consensus seems to be that the default relevance ranking should not
be how many times the search terms appear in the record, but whether
the title starts with or contains the search terms. So for instance,
a title search on "the civil war" should pull up all the titles with
that exact title first, followed by records that contain that phrase
in the title (even if not starting with it), followed by records where
the terms appear in some title field somewhere in the record. Contrast
that with the current behavior which puts "Voices from the Civil War"
first (I'm assuming this is because the terms "the, civil, war" appear
more times in that record than any other, or because they appear the
same number of times but that record was indexed more recently.)
One-word titles like "Tango" should come up first with a title search
for "Tango".
AUTHOR SEARCHING
Again, the current relevance ranking doesn't quite cut it. A good
example is a relevance ranked author search on "James Joyce". Some
records sneak into high relevance because they have multiple authors
with names like "James Henry" and "Paul Joyce" (take "Bob the Builder
in the NPL database as an example). The relevance ranking
should account for proximity and use that as the highest ranking
consideration to ensure that a search on "James Joyce" returns all the
books by "James Joyce" first. Also, they requested that the default
ranking secondarily sort the items by date as well because they often
are asked to find the 'latest' book by so and so. We concluded that
the copyright date stored in the 008 is probably the only date
normalized enough to use for sorting though I'm not sure if zebra can
use that for sorting.
SUBJECT SEARCHING
They seemed pleased with the way subject searching was working, it
will correctly find things like "horses--psychology" where the first
term is in 650$a and the second in $x. However, it seems not to
rank things based on proximity within a tag -- meaning that a search
on horses--psychology will pull up records containing:
650$a horses
$x pets
650$a humans
$x psychology
and records with the actual 'horses--psychology' (650$a$x) subject
heading aren't given any favor in the ranking (I misplaced my actual
example and the one above is one I invented).
SUBJECT HEADING SEARCH
NPL would like to see a demonstration of a 'Subject Heading' search
using authorities generated from the data to compile a list of
authoritative headings (which would be compiled from multiple fields
within a given subject tag such as $650$a$v$x, etc.). So I think
to do this right we'd need to look at putting our authority records
in Zebra as well.
SERIES TITLE SEARCHING
Series title should pull from series title fields, not just general
title fields. We need to compile a list of these fields and create
an index for just series titles.
KEYWORD SEARCHING
I saved this one for last on purpose. We're not really sure what
keyword searching is supposed to be :-). We suspect that patrons
expect that the keyword search is equivilent to a web search engine
search box like Google. Given our limitations in the MARC format,
what are our ranking options? Should titles be given priority,
authors, subjects? We're not really sure. However, NPL did like the
idea of offering a two-column results page for keyword searching--
one side with library items and the other with results from the web
using Google's API or something.
Well, that's it for now. NPL staff are going to continue looking at
the demo and providing me with feedback so I'll pass it along as I
recieve it. Please feel free to add your own comments on the demo
as well as comment on NPL's ideas.
Cheers,
--
Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE
President, Technology migration, training, maintenance, support
LibLime Featuring Koha Open-Source ILS
address@hidden |Full Demos at http://liblime.com/koha |1(888)KohaILS