[Koha-devel] inverted list Proof of Concept [quest for search]

koha-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Koha-devel] inverted list Proof of Concept [quest for search]

From:	Paul POULAIN
Subject:	[Koha-devel] inverted list Proof of Concept [quest for search]
Date:	Fri May 27 03:01:35 2005
User-agent:	Mozilla Thunderbird 1.0 (X11/20041206)

Hi,

After some chat with Kados, I've spent some time to write a proof ofconcept for my inverted list idea. It's not perfect, but it works quitegood. I've commited it in CVS a few minuts ago (misc/build_marc_Tword.pland C4::SearchMarcTest.pm files)

NPL should run it on their DB, that is really huge, and report howperformances are.


How it works :
===============
* create the table marc_Tword with the following structure :
CREATE TABLE `marc_Tword` (
  `word` varchar(80) NOT NULL default '',
  `usedin` text NOT NULL,
  `tagsubfield` varchar(4) NOT NULL default '',
  PRIMARY KEY  (`word`,`tagsubfield`)
) TYPE=MyISAM;
* open a console & type export PERL5LIB & export KOHA_CONF as usual.

* fill this table with misc/build_marc_Tword.pl. Warning, this scriptuses a very very consumming but very fast method to fill the table : itdoes everything in memory, then write everything. Another method isprovided (& commented), but it's 100x times slower (really !)

* open opac-search.pl and replace
        use C4::SearchMarc;
by
        use C4::SearchMarcTest;
as the API hasn't changed, it will work immediatly.

* go to opac-search (advanced search) & search whatever you want. ItShould work fine (except in "Any word" that is not implemented for instance)


LIMITS :
==========

* build_marc_Tword has problem with extended chars (accented onesmainly). So don't be afraid if you get sql errors. They are not aproblem for a POC

* search works always order by title, whatever you choose.

* search works only search WORDA and WOARDB, not yet WORDA or WORDB orWORDA except WORDB. Due to structure, those search should cost exactlythe same thing as others.

you can test WORDA or WORDB very easily. in SearchMarcTest, at line 240,
replace
        @result = keys %intersect;
by
        @result = keys %union;

Some infos on perfs/size :
============================

On my largest DB (1 900 000 lines in marc_subfield_table), thebuild_marc_Tword.pl script requires 60s for the 1st SQL read, then 120sfor reading the whole table, then another 100s to write the table. Ittakes up to 350-400 MB of RAM, and at the end, the marc_Tword table is245 000 lines, for 100MB of disk space. Note we could probably lowerthis number by checking better %ignore_list.


Tests :
========

A search on a title word (socia*, with a *) that get 4500 resultsarrives in less than 2 seconds (mysql cache being empty).

A search on another word (chan*) that get 590 results is the same time.

A search on both words (socia* chan*) is 446 results and get the sameduration.Doing the same test on a 2.2 install is VERY different : my SCSI harddisk does a lot of noise & the result for socia* chan* requires almost20 seconds to appear.

Testing with a OR (chan* OR socia*) get 4930 results, in probably lessthan 2 seconds.

(Those results could probably be improved a little, as there are a lotof SQL queries to get item number & status & biblio subtitle.)

Note that in the logs, you'll get something for every sql->execute (withthe sql executed). You can see that most queries are not about thesearch itself, but about the item number & status.


--
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)

[Prev in Thread]

Current Thread

[Next in Thread]

[Koha-devel] inverted list Proof of Concept [quest for search], Paul POULAIN <=
- Re: [Koha-devel] inverted list Proof of Concept [quest for search], Paul POULAIN, 2005/05/30

Prev by Date: Re: [Koha-devel] How we can input a very large field
Next by Date: Re: [Koha-devel] inverted list Proof of Concept [quest for search]
Previous by thread: [Koha-devel] Minutes of Website/Interface Design Meeting 2005-05-26
Next by thread: Re: [Koha-devel] inverted list Proof of Concept [quest for search]
Index(es):
- Date
- Thread