Re: [Koha-devel] marc_word and searching

koha-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-devel] marc_word and searching

From:	paul POULAIN
Subject:	Re: [Koha-devel] marc_word and searching
Date:	Fri May 28 00:28:00 2004
User-agent:	Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.6) Gecko/20040115

Joshua Ferraro a écrit :

Paul a écrit:

NPL had a tech meeting today focusing on the opac searching and we have
reached some tentative conclusions about how to proceed.  Running some
test searches using marc_subfield_table we realized that a search model

based on that table is inadequate for our needs. For example, a searchon 'patrick o'brian' using the 'like' syntax produces no results if the

database entry is stored as 'o'brian, patrick' (when author is stored in
the 100a that is the format).  On the other hand, a search using the current
marc_word model fails for reasons we have already talked about (marc_word
does not keep track of single characters, &c.).  But if the marc_word table

did index single charcters, a search model based on marc_word would workvery well. For example, a search on 'o'brian, patrick' or 'patrick o'brian'

would both return the correct records.  So our idea is to re-create our
marc_word table so that it indexes all characters from the tags and subfields
that we want to use for searches (we don't need all of them as you pointed
out; for instance, we will never use 300 for a search).  So we have three
basic tasks:

another idea, that would be better maybe :
replace ' by _.
Thus, o'brian searches o_brian, that will be stored in the DB.

The only limit is that a search on brian won't be successful. Tell me ifit's a problem.

Otherwise, we could add a 'index also 1 letter words', but, imho, ONLYwith the 'do not index this subfield feature'.


Everybody can give it's opinion here. Both solutions are easy to code.

1.) write a script to re-create marc_word using the parameters we choose
for searching and including all characters.

2.) fix Biblio.pm so that it will include all characters when it adds records
to marc_word (currently we add to our holdings using a modified version of
bulkmarcimport.pl that relies on Biblio.pm)

3.) write a clean-up script to delete all the tags and subfields from marc_word
that we will never use (like 300)

Does that sound like a sound plan to you Paul?  Do you have any scripts that

will speed up the process of re-building our marc_word table--if not we willwrite one ourselves. Can you make the changes to Biblio.pm that will force

it to index single characters?

yep, if we decide to do it.
I've no speedy script to rebuild marc_word table :-(

One final point about search results.  Currently the marc searching does
not pass all the variables to the template so that we can choose what
values to display (for example, Lord of the Rings: The Two Towers currently
displays as 'Lord of the Rings:' without the subtitle).  I suggest that
we setup a method of easily making marc fields available to the template

so that each library can decide exactly what marc fields they want todisplay for the initial search results.

already planned. I'll try to commit some code on CVS ASAP.
"MARC view" is ready (in OPAC).

we plan to add a systempreference called 'ISBD' where the library coulddefine it's own biblio presentation.

Something like :
[200a;][200b/][(100c)]

The ; means a ; is added AFTER the 200a, the ( means a ( is added BEFOREthe 100c.

Not exactly a ISBD view, but not too far either.

Comments? Suggestions?
you're right stephen...
I have an other idea that could be coded quickly : in the MARCframework, we could add a checkbox called "do NOT index this subfield".If checked, the subfield wouldn't be stored in marc_word (but stored inmarc_subfield_table)
(Needs a script to clean the DB too, should be quite easy :
foreach subfield in marc_subfield_structure {
  if checkbox checked {
     delete from marc_word where subfield= this one
  }
}
...)

--
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)

[Prev in Thread]

Current Thread

[Next in Thread]

[Koha-devel] Ready to Hack, Nathan Gray, 2004/05/17
- Re: [Koha-devel] Ready to Hack, Chris Cormack, 2004/05/17
  - Re: [Koha-devel] Ready to Hack, paul POULAIN, 2004/05/18
    - Re: [Koha-devel] Ready to Hack, Benedict P. Barszcz, 2004/05/18
    - [Koha-devel] marc_word and searching, Joshua Ferraro, 2004/05/24
    - Re: [Koha-devel] marc_word and searching, Stephen Hedges, 2004/05/26
    - Re: [Koha-devel] marc_word and searching, paul POULAIN, 2004/05/26
    - [Koha-devel] marc_word and searching, Joshua Ferraro, 2004/05/27
    - Re: [Koha-devel] marc_word and searching, paul POULAIN <=
    - Re: [Koha-devel] marc_word and searching, Joshua Ferraro, 2004/05/30
  - Rel_2_0 branch, was: [Koha-devel] Ready to Hack, MJ Ray, 2004/05/18

Prev by Date: [Koha-devel] marc_word and searching
Next by Date: Re: [Koha-devel] Please restart IRC logger
Previous by thread: [Koha-devel] marc_word and searching
Next by thread: Re: [Koha-devel] marc_word and searching
Index(es):
- Date
- Thread