[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-tar] Bug with files which have hardlinks
From: |
Tim Kientzle |
Subject: |
Re: [Bug-tar] Bug with files which have hardlinks |
Date: |
Sat, 9 Oct 2010 13:41:36 -0700 |
On Oct 9, 2010, at 12:19 AM, Bob Proulx wrote:
>
> $ tar cvvf x.tar afile afile afile
This is a somewhat strange request you're making.
I doubt many tar implementations have done anything to
optimize this specific case (de-duping the argument list
would be one strategy, although -C handling makes that
a bit more complex than it sounds).
> $ tar cvvf x.tar afile afile afile
> -rw-rw-r-- bob/bob 32 2010-10-09 01:07 afile
> -rw-rw-r-- bob/bob 32 2010-10-09 01:07 afile
> -rw-rw-r-- bob/bob 32 2010-10-09 01:07 afile
> $ tar cvvf x.tar afile afile afile
> -rw-rw-r-- bob/bob 32 2010-10-09 01:07 afile
> hrw-rw-r-- bob/bob 0 2010-10-09 01:07 afile link to afile
> hrw-rw-r-- bob/bob 0 2010-10-09 01:07 afile link to afile
These are both "reasonable" answers to your request.
Neither is really wrong, so there's not really a bug here.
The latter version is smaller (a link entry generally takes
less space than a full copy of the file), but the former is easier
to restore. Which one you see will depend heavily on
which tar implementation you're using (GNU tar is just one of
many) and how it optimizes detecting hard links.
The basic strategy used by tar implementations for archiving
hard links is to keep a table that maps dev/ino values to
file names and create a hard link entry in the archive when
the tar program sees something that's already in the table.
The most straightforward implementation of this strategy would
give you the output you listed second regardless of the number
of actual links on the file.
But such tables can get very large if you're using tar to
backup a very large filesystem (a system with a billion
files could easily require hundreds of gigabytes to store
every filename). So tar implementations generally skip
adding something to this table when the link count reported
by the filesystem is 1.
In the cases above, this explains the difference you're
seeing. In the first case, nothing was entered into the
internal table, so the tar program saw each "afile" as a
separate file to be archived. Bumping the link count
caused an entry to occur in the internal table and resulted
in links being generated.
You might also have seen the second "afile" get recorded
as a link and the third be stored normally if the tar program
had taken the further optimization of removing the internal
table entry when the expected number of references had
been seen.
It's interesting to compare this with the strategies required
when writing formats such as the newer cpio variants (which
effectively store hard link entries first, and the "real" file
data last).
As Joerg pointed out, the more interesting problem is
how this gets handled on extract. The second form is
tricky to restore correctly.
Cheers,
Tim
- [Bug-tar] Bug with files which have hardlinks, Thomas Graf, 2010/10/07
- Re: [Bug-tar] Bug with files which have hardlinks, Paul Eggert, 2010/10/07
- Re: [Bug-tar] Bug with files which have hardlinks, Joerg Schilling, 2010/10/08
- Re: [Bug-tar] Bug with files which have hardlinks, Bob Proulx, 2010/10/09
- Re: [Bug-tar] Bug with files which have hardlinks, Joerg Schilling, 2010/10/09
- Re: [Bug-tar] Bug with files which have hardlinks, Paul Eggert, 2010/10/09
- Re: [Bug-tar] Bug with files which have hardlinks, Bob Proulx, 2010/10/11
- Re: [Bug-tar] Bug with files which have hardlinks,
Tim Kientzle <=