[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] multi-part yenc
From: |
Duncan |
Subject: |
Re: [Pan-users] multi-part yenc |
Date: |
Mon, 10 Sep 2012 13:29:26 +0000 (UTC) |
User-agent: |
Pan/0.139 (Sexual Chocolate; GIT 4162e82 /usr/src/portage/src/egit-src/pan2) |
Thufir posted on Mon, 10 Sep 2012 04:52:28 -0700 as excerpted:
> Is there a tutorial for yenc? I've read the wikipedia entry and am as
> clueless as when I started.
>
> I did find:
>
> http://packages.ubuntu.com/precise/news/python-yenc
>
> but it has no manpage entry :(
>
>
> What do you do with the multipart attachments after you've downloaded
> them?
>
> From going through the list archives on gmane, it looks like pan is
> supposed to magically handle multi-part yenc attachments spread across
> multiple messages?
>
> However, what about videos? Or, x type file?
There's two kinds of splitting, pre-splitting, where the files are
normally split before posting so there's actually several files, each a
piece of the larger file, posted, and attachment splitting, where a
single file is split into multiple individual messages, each with a part
of the attachment, but the attachment isn't complete without all parts.
Because it's the posting app that does attachment splitting, if someone
ended up with all the parts but one or two and asked for a repost of just
those parts, the original poster would often not have them since they
were split on-the-fly, and if nobody else got the part correctly either
and could post just it, the original poster would have to repost the
entire set again, that being the only way to get the same split. (This
was a much bigger problem back before PAR files became common. The error
correction and data redundancy they provide makes it possible to recover
from individual corrupted or missing messages these days, as long as not
too many are missing and the PARs were posted.)
For that reason, the biggest files would often be pre-split into smaller
individual files before posting. These smaller files would in turn be
attachment split as described above.
On the download end, the news client generally handles attachment-split
reassembly automatically... as long as all the parts are there. This is
the bit that pan does transparently. As long as all the parts are there,
you don't even see that it's a bunch of individual messages combined to
allow recreation of the attachment at all, pan only lists a single entry
for the whole set of posts required to reassemble that file. (If parts
are missing, pan still lists it as a single entry, but with an x/y
indicator as part of the subject line so you know how many parts are
available vs missing. You can still force pan to reassemble what it has,
using the forced "read message" function, but the attachment will be
corrupt due to missing data.)
Pan, and yenc, is entirely pre-split file agnostic. Each of the pre-
split pieces was posted as a separate file and that's what pan downloads
and saves, the smaller individual pre-split files, which the user must
manually reassemble after saving, just as the poster pre-split before
posting. (Tho some of the power-posting apps handle pre-splitting based
on pre-configured parameters as well. But they're still posted as
smaller individual files that must be reassembled into a whole.)
How you do that reassembly depends on how the original larger file was
pre-split. In the simplest case, it was simply split into equal size
chunks, nothing added, nothing subtracted, with each chunk numbered
appropriately. Ideally a series of 10 chunks will be numbered *.01 thru
*.10, not *.1 thru *.10, and similarly a series of 100 chunks will be
numbered *.001 thru *.100, not *.1 thru *.100, thus preserving file
listing order.
In this case, a simple redirected cat (short for conCATenate) command
suffices for recombining:
cat file.mpg.* > file.mpg
(That's Unix/Linux. On MS back in the DOS days anyway it was similar,
but using the copy command. Alternatively, the poster would often
include a file.mpg.bat batchfile script for reassembly. Downloaders
could use the script or just type the command themselves, but the
existence of a *.bat file was a sure sign that this type of splitting had
been used, so it was nice to see it even on Linux, since it meant the
simple cat command method would work. I'd assume it remains about the
same today.)
If the numeric file suffixes weren't 0-prefixed appropriately, it's a bit
more difficult, but still /reasonably simple. Taking the *.1 thru *.100
example (make sure you have wrapping turned off to view this, the below
assumes you know the wildcards aren't going to match unintended files,
presumably because you're working in a subdir that only contains the
pieces you want to assemble into one):
# first combine the single-digit suffix files, creating a
# double-digit-suffix file that orders before any of the existing ones
cat file.mpg.? > file.mpg.01
# next combine the double-digit suffix files
# (including the one created above)
cat file.mpg.?? > file.mpg.001
# OK, now the triple-digit suffixes
cat file.mpg.??? > file.mpg
# Now test, and once you're sure it works, delete all the parts
rm file.mpg.*
Another common type of pre-split file that uses sequential numeric
extensions is the RAR archive format. This archive format seems to be
most common in East Asia and thus from East Asian posters. It's much
like zip or the combined tar.* compression AND archive formats in that
it's a compression as well as archive format, and thus can contain whole
directories. But unlike zip and tar.*, the archiver's ability to split
the archive at preset sizes is often used as well, with these parts then
posted.
For these files you simply use unrar (gpg but unarchives only, rar itself
is proprietary) or some other (un)archiver that handles rar files, since
the split is a native part of the format and the unarchiving process thus
knows how to reassemble before unarchiving. IIRC, this format can
normally be identified by the fact that in addition to the numbered
files, one file (IIRC the first part, but it's been years...) is simply
*.rar.
Then there's the various proprietary splitters, some of which append or
prepend various metadata to each chunk, thus requiring the same software,
often MS-only, for reassembly, Fortunately, because these DO require
specific software for reassembly, they don't tend to be very popular.
Finally, there's the PAR formats. I actually only worked with PAR-1
files back in the day (I know PAR-2 was more common later, and for all I
know there's PAR-3+ now...), and don't remember the specifics too well,
but the general idea is that these provide additional redundant data for
error correction. 10-30% of the parts may be missing and the data may
still be recoverable, provided you have enough PAR files. IIRC at least
some of these use *.par.N extensions, with the N being a numeric, of
course. You don't need these unless the main post is corrupted or
partially missing as they're only used for recovery of missing/corrupted
data, but if you DO need to recover missing/corrupted data, definitely
get help from someone with more current experience with these than I have.
That should be a reasonable general overview, anyway... With any luck,
you'll only have to do the redirected-cat style assembly. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman