bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Patch] Locate: Fold case once at most.


From: Bas van Gompel
Subject: Re: [Patch] Locate: Fold case once at most.
Date: Mon, 6 Jun 2005 16:47:43 +0200 (MET DST)
User-agent: slrn/0.9.8.1 (Win32) Hamster/2.0.6.0 KorrNews/4.2

Op Mon, 6 Jun 2005 06:03:10 +0100 schreef James Youngman
in <address@hidden>:
:  On Mon, Jun 06, 2005 at 05:00:15AM +0200, Bas van Gompel wrote:
[...]
: > :   * locate/locate.c: Fold case once, only when needed.
: >
: > Could you please confirm receiving the above mail, and the patch
: > it refers to?
:
:  Yes, I've received the patch.   Thanks for your help.

OK.

: > Any comments?
:
:  When we discussed this functionality earlier, I mentioned that it used
:  to be the case that findutils would fold the case of each character
:  once (or never if not required).  It used to cope intelligently with
:  case-folding when dealing with LOCATE02 databases, too (that is, it
:  didn't repeatedly case-fold repeated prefixes).  I indicated that I
:  had sacrificed this optimisation when refactoring the tests.

This is (as I mentioned before) not related to the patch I supplied.
It is another issue, and should be addressed separately.

The issue addressed by /this/ patch is the fact that multiple copies of
the case-folded database-entry are kept, currently.

:  I've since been thinking that the bits of code that read from the
:  database should be more decoupled.  That would also allow us to tell
:  the database backend whether we need it to do case folding.  That wold
:  bring us back to a situation where we can regain the optimisation, and
:  may also allow us to support additional database formats more neatly.
:  One that comes to mind is gzip, which should produce a locate database
:  less than half the size:-
:
:  $ /usr/local/bin/locate -d  /usr/local/var/locatedb  '?*'  | gzip -9 > 
filenames.gz
:  address@hidden:~$ ls -l filenames.gz /usr/local/var/locatedb -rw-r--r--  1 
james users 2188446 2005-06-06 05:54 filenames.gz
:  -rw-r--r--  1 root  staff 4792566 2005-06-05 06:28 /usr/local/var/locatedb

But gzip is not very fast to decode. bzip2 might be a better option...

:  So, I'm thinking about reorganising "locate" to decouple the database
:  reading code a bit more, which is why I haven't applied the patch
:  (yet).

My local version of locate already has moved reading from the databases
into visitors. That should be extensible quite easily. If you like, I'll
submit that part ASAP (after breaking it out).

(Please apply the patch, I never intended current CVS state to exist
for any extended period. It is ugly!)


L8r,

Buzz.
-- 
  ) |  | ---/ ---/  Yes, this | This message consists of true | I do not
--  |  |   /    /   really is |   and false bits entirely.    | mail for
  ) |  |  /    /    a 72 by 4 +-------------------------------+ any1 but
--  \--| /--- /---  .sigfile. |   |perl -pe "s.u(z)\1.as."    | me. 4^re




reply via email to

[Prev in Thread] Current Thread [Next in Thread]