[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Improving ‘guix search’ scoring
From: |
zimoun |
Subject: |
Re: Improving ‘guix search’ scoring |
Date: |
Thu, 18 Jul 2019 13:11:32 +0200 |
Hi,
On Wed, 17 Jul 2019 at 23:27, Ludovic Courtès <address@hidden> wrote:
> I guess computing the TF-IDF could perhaps improve the results compared
> to the current scoring mechanism. It would be worth trying to implement
> it.
>
> The bottom line though, as you wrote, is that this all depends on the
> quality of synopses and descriptions, and there’s only so much we can
> draw from 5-line descriptions.
>From my opinion, because the description is say 5 lines plus the
synopsis, before implementing something, one needs to first analyse
the "quality" of the available information (words + dependencies). I
mean doing some "data science" (buzz buzz! :-)) with R or Python.
And I do not know the state-of-art of recommender systems. Neither
applied to packages retrieval. I have never read something about that
in other distributions (Debian, Gentoo, etc.). Someone does? Any
pointer?
For example, the current scoring looks like a poor man version of the
Boolean model of Information Retrieval [1]. What about the Okapi model
[2]? etc.
Well, if a student is reading this thread and is looking for a project. ;-)
And I will try to give a look after my summer holidays.
Please share your opinion or experience.
All the best,
simon
[1] https://en.wikipedia.org/wiki/Boolean_model_of_information_retrieval
[2] https://en.wikipedia.org/wiki/Okapi_BM25