|
From: | Timothy Beryl Grahek |
Subject: | Re: [Lzip-bug] Tarlz 0.4: Use of 'ustar' format instead of 'posix'; question about future of Tarlz utility |
Date: | Mon, 4 Jun 2018 19:41:10 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 |
Hi Antonio,
Thanks. I won't use the pax format then.
Upon doing some additional reading out of curiosity, I noticed the following important paragraphs regarding the 'pax' format:
"The concept of a global extended header (/typeflag/ *g*) was controversial. If this were applied to an archive being recorded on magnetic tape, a few unreadable blocks at the beginning of the tape could be a serious problem; a utility attempting to extract as many files as possible from a damaged archive could lose a large percentage of file header information in this case. However, if the archive were on a reliable medium, such as a CD-ROM, the global extended header offers considerable potential size reductions by eliminating redundant information. Thus, the text warns against using the global method for unreliable media and provides a method for implanting global information in the extended header for each file, rather than in the /typeflag/ *g* records.
"No facility for data translation or filtering on a per-file basis is included because the standard developers could not invent an interface that would allow this in an efficient manner. If a filter, such as encryption or compression, is to be applied to all the files, it is more efficient to apply the filter to the entire archive as a single file. The standard developers considered interfaces that would invoke a shell script for each file going into or out of the archive, but the system overhead in this approach was considered to be too high."
Certainly the pax format must be changed or abandoned.
Perhaps it is a good idea to contact the authors of the 'pax' format and propose that it is worth their time to put more emphasis on data preservation. What do you think? I am extremely willing to do this, if you think this is possible.
Nevertheless, it is to be noted that I have discovered the GNU Tar authors are interested in the 'pax' format according the following link: https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html But they may have some way in mind to deal with the fact that extended records are not protected. It seems unclear. That calls into question the legitimacy of the 'gnu' format for data protection, since they feel certain to abandon this format. Do you know how we will find out what they do with files larger than 8 GB and file names longer than 256 characters? Let us hope that they keep track of data preservation.
In the event that the GNU format also seems unreliable, it may be wise to stick with 'ustar' only; the restrictions on the format aren't unreasonable, given the fact that it is a safe archiving format; in other words regarding this, 8 GB in a single file and 256 characters for a file name can be reasonably accomodated. Yet, perhaps there is another archive format that we are overlooking; however, I wonder if a new archive format must be invented that does not commit the oversight that the 'pax' format seems to have committed. Whatever keeps in mind data preservation and happens to be humanly possible is best. But I am certainly eager to know what the future holds, regardless of the outcome.
Thank you for your time regarding this matter. Please feel free to take your time; I know you are very busy.
Agreed. Any tar format used by tarlz must be safe by itself. Remember that tarlz can also create uncompressed archives.
Yes, safe uncompressed archives are extremely important; otherwise, it is best not to archive at all.
Best regards, Timothy
[Prev in Thread] | Current Thread | [Next in Thread] |