[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ‘unlinkat’ bug in Linux 4.0.2 leads to tar test failure
From: |
Pádraig Brady |
Subject: |
Re: ‘unlinkat’ bug in Linux 4.0.2 leads to tar test failure |
Date: |
Sun, 24 May 2015 12:57:56 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
On 24/05/15 12:33, Ludovic Courtès wrote:
> (Please keep address@hidden Cc'd.)
> (Gnulib: please scroll further down for the ‘unlinkat’ issue.)
>
> Andy Patterson <address@hidden> skribis:
>
>>> I suppose this is Guix 0.8.2 on top of another distribution, right? Did
>>> you install from source or from the binary tarball? Did you enable
>>> substitutes (info "(guix) Substitutes")?
>>
>> I was using the USB install medium in a live environment.
>
> So this is on GuixSD 0.8.2. ‘test-suite.log’ indeed mentions
> Linux-libre 4.0.2.
>
>> I had substitutes enabled (I'm pretty sure they're enabled by default
>> here, but I also enabled them manually just to be sure). I wasn't able
>> to install anything with substitutes enabled; it would always stall
>> while trying to update the substitutes list from hydra. When my
>> network went down briefly, it informed me that it was still at 0.0%
>> before exiting. I think that this is probably a separate issue, but
>> which which I was less concerned about since I didn't want to use
>> substitutes anyway.
>
> OK.
>
> hydra.gnu.org is unfortunately too often overloaded these days, so you
> probably arrived on a bad day. Nevertheless, the solution to this
> specific issue is for you to use substitutes to circumvent the bug
> described below.
>
>>> Does the build succeed if you run it another time with:
>>>
>>> guix build tar -K -c 1
>>
>> I tried this (with --no-substitutes), but I don't think the test suite
>> actually runs in parallel. I didn't notice any difference in that regard
>> when it was running; it seemed to take up the same amount of time with
>> or without -c 1. I had the same tests fail with the flag enabled.
>
> Oh you must be right. Looking at tests/Makefile.in, I see:
>
> --8<---------------cut here---------------start------------->8---
> check-local: atconfig atlocal $(TESTSUITE)
> $(SHELL) $(TESTSUITE) $(TESTSUITEFLAGS)
> --8<---------------cut here---------------end--------------->8---
>
> ... which shows that ./testsuite is not automatically passed -j,
> contrary to what I thought.
>
> <http://lists.gnu.org/archive/html/bug-tar/2014-08/msg00010.html>
> reports a similar issue but on a different OS.
>
> I just tried this in a GuixSD VM with Linux-libre 4.0.2:
>
> --8<---------------cut here---------------start------------->8---
> mkdir foo
> mkdir bar
> echo foo/foo_file > foo/foo_file
> echo bar/bar_file > bar/bar_file
> tar -cvf foo.tar --remove-files -C foo . -C ../bar .
> find .
> stat bar
> --8<---------------cut here---------------end--------------->8---
>
> And indeed, it fails (that is, ‘bar’ is left behind.) It works fine on
> 4.0.4-gnu though.
>
> On 4.0.2-gnu, I strace’d the ‘tar’ command above:
>
> --8<---------------cut here---------------start------------->8---
> openat(AT_FDCWD, "foo", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) =
> 4
>
> [...]
>
> openat(4, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 5
>
> [...]
>
> openat(5, "foo_file", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 6
>
> [...]
>
> openat(4, "../bar", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> newfstatat(5, ".", {st_mode=S_IFDIR|0755, st_size=60, ...},
> AT_SYMLINK_NOFOLLOW) = 0
> openat(5, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 6
>
> [...]
>
> openat(6, "bar_file", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 7
> fstat(7, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
> write(1, "./bar_file\n", 11) = 11
> read(7, "x\n", 2) = 2
> fstat(7, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
> close(7) = 0
> fstat(6, {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
> brk(0x1a34000) = 0x1a34000
> close(6) = 0
> write(3, "./\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 10240) = 10240
> close(3) = 0
> unlinkat(4, "foo_file", 0) = 0
> unlinkat(AT_FDCWD, "foo", AT_REMOVEDIR) = 0
> unlinkat(5, "bar_file", 0) = 0
> unlinkat(4, "../bar", AT_REMOVEDIR) = -1 ENOENT (No such file or
> directory)
> --8<---------------cut here---------------end--------------->8---
>
> Contrast this with the same thing on 4.0.4-gnu:
>
> --8<---------------cut here---------------start------------->8---
> unlinkat(4, "foo_file", 0) = 0
> unlinkat(AT_FDCWD, "foo", AT_REMOVEDIR) = 0
> unlinkat(5, "bar_file", 0) = 0
> unlinkat(4, "../bar", AT_REMOVEDIR) = 0
> --8<---------------cut here---------------end--------------->8---
>
> So this looks like a 4.0.2 kernel bug that Gnulib’s unlinkat should
> perhaps work around.
>
> Thoughts?
Maybe. How widely deployed was 4.0.2 (It's not used in Red Hat land for
example).
How many versions was the bug present for?
If it was just a fleeting issue, then there is less incentive to workaround.
cheers,
Pádraig