|
From: | Dmitry Gutov |
Subject: | Re: Subprojects in project.el (Was: Eglot, project.el, and python virtual environments) |
Date: | Fri, 25 Nov 2022 01:38:08 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
On 25/11/22 00:46, Tim Cross wrote:
João Távora <joaotavora@gmail.com> writes:On Thu, Nov 24, 2022 at 3:01 AM Dmitry Gutov <dgutov@yandex.ru> wrote:I'm imagining that traversing a directory tree with an arbitrary predicate is going to be slow. If the predicate is limited somehow (e.g. to a list of "markers" as base file name, or at least wildcards), 'git ls-files' can probably handle this, with certain but bounded cost.I've seen references to superior performance benefits of git ls-file a couple of times in this thread, which has me a little confused. There has been lots in other threads regarding the importance of not relying on and not basing development on an underlying assumption regarding the VCS being used. For example, I would expect project.el to be completely neutral with respect to the VCS used in a project.
That's the situation where we can optimize this case: when a project is Git/Hg.
So how is git ls-file at all relevant when discussing performance characteristics when identifying files in a project?
Not files, though. Subprojects. Meaning, listing all (direct and indirect) subdirectories which satisfy a particular predicate. If the predicate is simple (has a particular project marker: file name or wildcard), it can be fetched in one shell command, like:
git ls-files -co -- "Makefile" "package.json"(which will traverse the directory tree for you, but will also use Git's cache).
If the predicate is arbitrary (i.e. implemented in Lisp), the story would become harder.
I also wonder if some of the performance concerns may be premature. I've seen references to poor performance in projects with 400k or even 100k files. What is the expected/acceptable performance for projects of that size? How common are projects of that size? When considering performance, are we not better off focusing on the common case rather than extreme cases, leaving the extremes for once we have a known problem we can then focus in on?
OT1H, large projects are relatively rare. OT2H, having a need for subprojects seems to be correlated with working on large projects.
What is the common case, in your experience, and how is it better solved? Globally customizing a list of "markers", or customizing a list of subprojects for every "parent" project?
[Prev in Thread] | Current Thread | [Next in Thread] |