[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-tar] Multiple path headers mixing sparse and xattrs
From: |
Dominique Martinet |
Subject: |
Re: [Bug-tar] Multiple path headers mixing sparse and xattrs |
Date: |
Thu, 23 Jun 2016 15:35:51 +0200 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
Hi,
Dominique Martinet wrote on Thu, Jun 09, 2016 at 01:22:53PM +0200:
> (For archive digging purpose, this looks a lot like
> http://lists.gnu.org/archive/html/bug-tar/2010-11/msg00095.html ; except
> that the file name must contain utf8/non-valid ASCII component)
>
> We've noticed the extracted path for some file is wrong IF both --sparse
> and --xattrs is used AND the file is sparse and its path contains some
> "weird" characters.
>
> Here's a full reproducer, ran it on today's git master branch:
>
> $ cd $(mktemp -d)
> $ mkdir -p t
> $ dd if=/dev/urandom of=t/barbarbar bs=1M seek=50 count=1
> $ cp t/barbarbar t/mumuµmu
> $ tar --xattrs -S -c t | tar -t
> t/
> t/barbarbar
> t/GNUSparseFile.6221/mumuµmu
>
> I'm just listing here, but it would be extracted as such as well.
> Looking at the binary tar, the problem is that the path is listed twice
> for mumuµmu:
> 30 GNU.sparse.name=t/mumuµmu
> ...
> 38 path=t/GNUSparseFile.6236/mumuµmu
>
> (while barbarbar only has GNU.sparse.name, and no path attribute)
>
>
> For now I've just quick & dirty patched my own src/xheader.c path_decode
> function to take the first path because it seems to work™ and we're in a
> bit of a hurry;
> another workaround as given in the mail I quoted at start would be to
> use --sparse-version=0
>
>
> I guess the main fix should be to only output the header once though;
> looking at the code (src/create.c, write_header_name), it seems that we
> explicitely check !string_ascii_p (st->file_name) and write the extra
> header then.
> I'm not quite sure how to cleanly check that we already wrote the
> filename in another attribute then...
>
> (Thinking back we might want to handle retro-compatibility and handle
> archives made with existing tar versions over changing the way we code
> output; so maybe always preferring GNU.sparse.name over path without
> relying on order would be a better solution ?)
Does anyone have an opinion on this ?
Would you take a patch if I went through the trouble of implementing
either solution ?
I don't really care on which solution to implement and both look
possible to do (either not writing improper path in output tar or
ignoring path if GNU.sparse.name is set on extracting); but I'd rather
not pick one and be told "no we prefer the other one" after not getting
any feedback... Or just being plain ignored.
Thank you,
--
Dominique Martinet