koha-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-devel] marc_word and searching


From: paul POULAIN
Subject: Re: [Koha-devel] marc_word and searching
Date: Fri May 28 00:28:00 2004
User-agent: Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.6) Gecko/20040115

Joshua Ferraro a écrit :

Paul a écrit:

NPL had a tech meeting today focusing on the opac searching and we have
reached some tentative conclusions about how to proceed.  Running some
test searches using marc_subfield_table we realized that a search model
based on that table is inadequate for our needs. For example, a search on 'patrick o'brian' using the 'like' syntax produces no results if the
database entry is stored as 'o'brian, patrick' (when author is stored in
the 100a that is the format).  On the other hand, a search using the current
marc_word model fails for reasons we have already talked about (marc_word
does not keep track of single characters, &c.).  But if the marc_word table
did index single charcters, a search model based on marc_word would work very well. For example, a search on 'o'brian, patrick' or 'patrick o'brian'
would both return the correct records.  So our idea is to re-create our
marc_word table so that it indexes all characters from the tags and subfields
that we want to use for searches (we don't need all of them as you pointed
out; for instance, we will never use 300 for a search).  So we have three
basic tasks:
another idea, that would be better maybe :
replace ' by _.
Thus, o'brian searches o_brian, that will be stored in the DB.
The only limit is that a search on brian won't be successful. Tell me if it's a problem.

Otherwise, we could add a 'index also 1 letter words', but, imho, ONLY with the 'do not index this subfield feature'.

Everybody can give it's opinion here. Both solutions are easy to code.

1.) write a script to re-create marc_word using the parameters we choose
for searching and including all characters.

2.) fix Biblio.pm so that it will include all characters when it adds records
to marc_word (currently we add to our holdings using a modified version of
bulkmarcimport.pl that relies on Biblio.pm)

3.) write a clean-up script to delete all the tags and subfields from marc_word
that we will never use (like 300)

Does that sound like a sound plan to you Paul?  Do you have any scripts that
will speed up the process of re-building our marc_word table--if not we will write one ourselves. Can you make the changes to Biblio.pm that will force
it to index single characters?
yep, if we decide to do it.
I've no speedy script to rebuild marc_word table :-(

One final point about search results.  Currently the marc searching does
not pass all the variables to the template so that we can choose what
values to display (for example, Lord of the Rings: The Two Towers currently
displays as 'Lord of the Rings:' without the subtitle).  I suggest that
we setup a method of easily making marc fields available to the template
so that each library can decide exactly what marc fields they want to display for the initial search results.
already planned. I'll try to commit some code on CVS ASAP.
"MARC view" is ready (in OPAC).
we plan to add a systempreference called 'ISBD' where the library could define it's own biblio presentation.
Something like :
[200a;][200b/][(100c)]

The ; means a ; is added AFTER the 200a, the ( means a ( is added BEFORE the 100c.
Not exactly a ISBD view, but not too far either.

Comments? Suggestions?
you're right stephen...
I have an other idea that could be coded quickly : in the MARC framework, we could add a checkbox called "do NOT index this subfield". If checked, the subfield wouldn't be stored in marc_word (but stored in marc_subfield_table)
(Needs a script to clean the DB too, should be quite easy :
foreach subfield in marc_subfield_structure {
  if checkbox checked {
     delete from marc_word where subfield= this one
  }
}
...)
--
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]