[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Outreachy] 'guix git log' slowness?
From: |
zimoun |
Subject: |
Re: [Outreachy] 'guix git log' slowness? |
Date: |
Fri, 29 Jan 2021 01:04:12 +0100 |
Hi Magali,
(It is a slightly edited version I sent you. Since the aim of Outreachy
is also to interact with Community, let enjoy the French proverb: «more
crazy people we are, more fun we have». :-))
On Thu, 28 Jan 2021 at 00:53, Magali <magalilemes00@gmail.com> wrote:
> Another thing is that the command is a bit slower than 'git log' itself.
> Thoughts on how that could be improved?
The command is “slow”. A first quick analysis about the meaning of “slow”.
Basically, I have run twice:
--8<---------------cut here---------------start------------->8---
for n in 60000 10000 5000 1000 500 100 50 10 5 1;
do
time ./pre-inst-env guix git log --oneline \
| head -n $n > /dev/null ;
done
--8<---------------cut here---------------end--------------->8---
to have kind of warm cache. And again for the equivalent Git command.
Then, bit of Emacs edit processing to transform the output in the buffer
of ’M-x shell’ to something in the Python file:
tguix = np.array([2.871, …
tgit = np.array([0.013, …
(Well, to be correct, it is not twice but a couple of times to have an
average.)
Let normalize by removing the additive constant and run a classic
linear regression:
t ~ B n ^ a => log(t) ~ a log(n) + b
where ’a’ and ’b’ have to be estimated.
Well, I am surprised by ’a ~ 0.5’ which means that “guix git log” runs
with a sublinear time complexity. However, “git log” is linear. Maybe
I am doing something wrong.
Initially I thought the tree was badly walked but a quick:
--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix git log --channel-cache-path
guix
/home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
$ git -C
/home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
\
log --oneline | wc -l
72791
$ ./pre-inst-env guix git log --oneline | wc -l
72791
--8<---------------cut here---------------end--------------->8---
shows it is correct. Hum?! Well, I suspect noise on the data and the
normalization is bad here. Running the experiment in a batch of 10
times then averaging them should give an analysis more meaningful. Hey,
that’s a quick one. :-)
The conclusion here, it scales well enough… for now.
Therefore, the real the question is about the «additive constant». On
my machine, it is ~2.9s and this is where there is room of improvement,
I guess.
I exported ’get-commits’ to have it in the REPL and I did:
--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix repl
GNU Guile 3.0.5
Copyright (C) 1995-2021 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guix-user)> ,use(guix scripts git log)
scheme@(guix-user)> (define (compute) (begin (get-commits) 'ok))
scheme@(guix-user)> ,time (compute)
$1 = ok
;; 2.533936s real time, 3.099156s run time. 0.901027s spent in GC.
scheme@(guix-user)>
--8<---------------cut here---------------end--------------->8---
And I let you run “,profile (compute)”. Well, this function should be
optimized. IMHO.
Initially, I thought about “stream” but I do no think it is the issue
here. Well, I think that the ’repo’ is open at each commit when
folding. Instead, it should be open before, keep alive and close at the
end of the ’fold’. Somehow. WDYT?
I will give a closer look to ’commit-closure’ because I am not convinced
it is useful here, neither.
Bonus, attached the Python script and the plot. :-)
Cheers,
simon
# coding: utf-8
#
import numpy as np
import matplotlib.pyplot as plt
n = np.array([1., 5., 10., 50., 100., 500., 1000., 5000., 10000., 60000.])
tguix = np.array([2.871, 3.006, 2.987, 2.892, 2.991, 3.038, 3.088, 3.315,
3.565, 5.634])
tgit = np.array([0.013, 0.006, 0.013, 0.016, 0.014, 0.029, 0.047, 0.12, 0.233,
1.283])
normalize = lambda x: x - x[0]
logify = lambda x: np.log10(x[3:])
x = logify(n)
y = logify(normalize(tguix))
z = logify(normalize(tgit))
X = np.concatenate((x.reshape((len(x), 1)), np.ones((len(x), 1))), axis=1)
guix, res, rank, s = np.linalg.lstsq(X, y, rcond=None)
git, res, rank, s = np.linalg.lstsq(X, z, rcond=None)
predict = lambda sol: sol[0] * x + sol[1]
plt.plot(x, y, 'bo', label='guix')
plt.plot(x, z, 'ro', label='git')
plt.plot(x, predict(guix), 'bx-')
plt.plot(x, predict(git), 'rx-')
plt.legend(loc='lower right')
plt.xlabel("log(N) where N in `| head -n N`")
plt.ylabel("log(real time)")
plt.show()
perf.png
Description: scale.png