bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24937: "deleting unused links" GC phase is too slow


From: Ludovic Courtès
Subject: bug#24937: "deleting unused links" GC phase is too slow
Date: Sun, 11 Dec 2016 19:02:42 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

Mark H Weaver <address@hidden> skribis:

> address@hidden (Ludovic Courtès) writes:
>
>> Here’s a proposed patch that follows your suggestion, Mark, but places
>> an upper bound on the number of directory entries loaded in memory.
>>
>> On my laptop, which has ~500k entries in /gnu/store/.links, the result
>> is something like this (notice the inode numbers in ‘lstat’ calls):
>>
>> 13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 48
>> 13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 
>> 8
>> 13738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, 
>> st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, 
>> st_size=60977152, st_atime=2016/12/11-12:01:59, 
>> st_mtime=2016/12/11-09:39:45, st_ctime=2016/12/11-09:39:45}) = 0
>> 13738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."}
>> [...]
>> 13738 
>> lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc",
>>  {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, 
>> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, 
>> st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, 
>> st_ctime=2015/11/25-11:29:20}) = 0
>> 13738 
>> lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px",
>>  {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, 
>> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, 
>> st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, 
>> st_ctime=2016/12/11-01:44:49}) = 0
>> 13738 
>> lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2",
>>  {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, 
>> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, 
>> st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, 
>> st_ctime=2015/11/25-11:29:20}) = 0
>> [...]
>> 13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492,
>> [...]
>> 13738 
>> lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m",
>>  {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, 
>> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, 
>> st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, 
>> st_ctime=2016/12/07-00:05:19}) = 0
>> 13738 
>> lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb",
>>  {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, 
>> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, 
>> st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, 
>> st_ctime=2015/11/25-11:29:21}) = 0
>> 13738 
>> lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2",
>>  {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, 
>> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, 
>> st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, 
>> st_ctime=2016/01/05-11:35:49}) = 0
>> [...]
>> 13738 getdents(8, {{d_ino=328672,
>> [...]
>> 13738 
>> lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz",
>>  {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, 
>> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, 
>> st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, 
>> st_ctime=2016/12/11-01:44:47}) = 0
>> 13738 
>> lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg",
>>  {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, 
>> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, 
>> st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, 
>> st_ctime=2015/11/25-11:29:20}) = 0
>> 13738 
>> lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0",
>>  {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, 
>> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, 
>> st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, 
>> st_ctime=2016/11/07-00:01:57}) = 0
>>
>> I can’t tell exactly how this affects performance because my machine has
>> an SSD where this operation takes ~3 seconds on a cold cache.  I suspect
>> it has performance comparable to that of doing all the ‘readdir’ at once
>> followed by all the ‘lstat’.
>>
>> Mark, how does that sound?
>
> I think we should sort the entire directory using merge sort backed to
> disk files.  If we load chunks of the directory, sort them and process
> them individually, I expect that this will increase the amount of I/O
> required by a non-trivial factor.  In each pass, we would load blocks of
> inodes from disk, almost all of which are likely to be present in the
> store and thus linked from the directory, but in this scheme we will
> process only a small number of them and drop the rest on the floor to be
> read again in the next pass.  Given that even my fairly optimal
> implementation takes about 35 minutes to run on Hydra, I'd prefer to
> avoid multiplying that by a non-trivial factor.

Sure, though it’s not obvious to me how much of a difference it makes;
my guess is that processing in large chunks is already a win, but we’d
have to measure.

> Why not just use GNU sort?  It already exists, and does exactly what we
> need.

Does ‘sort’ manage to avoid reading whole files in memory?  If not, I
think it wouldn’t buy us anything to use it.

> If you object to using an external program for some reason, I would
> prefer to re-implement a similar algorithm in the daemon.

Yeah, I’d rather avoid serializing the list of file names/inode number
pairs just to invoke ‘sort’ on that.

Also, what algorithm are you referring to?

Thanks for the quick feedback!

Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]