pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] multi-part yenc


From: Duncan
Subject: Re: [Pan-users] multi-part yenc
Date: Mon, 10 Sep 2012 13:29:26 +0000 (UTC)
User-agent: Pan/0.139 (Sexual Chocolate; GIT 4162e82 /usr/src/portage/src/egit-src/pan2)

Thufir posted on Mon, 10 Sep 2012 04:52:28 -0700 as excerpted:

> Is there a tutorial for yenc?  I've read the wikipedia entry and am as
> clueless as when I started.
> 
> I did find:
> 
> http://packages.ubuntu.com/precise/news/python-yenc
> 
> but it has no manpage entry :(
> 
> 
> What do you do with the multipart attachments after you've downloaded
> them?
> 
> From going through the list archives on gmane, it looks like pan is
> supposed to magically handle multi-part yenc attachments spread across
> multiple messages?
> 
> However, what about videos?  Or, x type file?

There's two kinds of splitting, pre-splitting, where the files are 
normally split before posting so there's actually several files, each a 
piece of the larger file, posted, and attachment splitting, where a 
single file is split into multiple individual messages, each with a part 
of the attachment, but the attachment isn't complete without all parts.

Because it's the posting app that does attachment splitting, if someone 
ended up with all the parts but one or two and asked for a repost of just 
those parts, the original poster would often not have them since they 
were split on-the-fly, and if nobody else got the part correctly either 
and could post just it, the original poster would have to repost the 
entire set again, that being the only way to get the same split.  (This 
was a much bigger problem back before PAR files became common.  The error 
correction and data redundancy they provide makes it possible to recover 
from individual corrupted or missing messages these days, as long as not 
too many are missing and the PARs were posted.)

For that reason, the biggest files would often be pre-split into smaller 
individual files before posting.  These smaller files would in turn be 
attachment split as described above.

On the download end, the news client generally handles attachment-split 
reassembly automatically... as long as all the parts are there.  This is 
the bit that pan does transparently.  As long as all the parts are there, 
you don't even see that it's a bunch of individual messages combined to 
allow recreation of the attachment at all, pan only lists a single entry 
for the whole set of posts required to reassemble that file.  (If parts 
are missing, pan still lists it as a single entry, but with an x/y 
indicator as part of the subject line so you know how many parts are 
available vs missing.  You can still force pan to reassemble what it has, 
using the forced "read message" function, but the attachment will be 
corrupt due to missing data.)

Pan, and yenc, is entirely pre-split file agnostic.  Each of the pre-
split pieces was posted as a separate file and that's what pan downloads 
and saves, the smaller individual pre-split files, which the user must 
manually reassemble after saving, just as the poster pre-split before 
posting.  (Tho some of the power-posting apps handle pre-splitting based 
on pre-configured parameters as well.  But they're still posted as 
smaller individual files that must be reassembled into a whole.)

How you do that reassembly depends on how the original larger file was 
pre-split.  In the simplest case, it was simply split into equal size 
chunks, nothing added, nothing subtracted, with each chunk numbered 
appropriately.  Ideally a series of 10 chunks will be numbered *.01 thru 
*.10, not *.1 thru *.10, and similarly a series of 100 chunks will be 
numbered *.001 thru *.100, not *.1 thru *.100, thus preserving file 
listing order.

In this case, a simple redirected cat (short for conCATenate) command 
suffices for recombining:

cat file.mpg.* > file.mpg

(That's Unix/Linux.  On MS back in the DOS days anyway it was similar, 
but using the copy command.  Alternatively, the poster would often 
include a file.mpg.bat batchfile script for reassembly.  Downloaders 
could use the script or just type the command themselves, but the 
existence of a *.bat file was a sure sign that this type of splitting had 
been used, so it was nice to see it even on Linux, since it meant the 
simple cat command method would work.  I'd assume it remains about the 
same today.)

If the numeric file suffixes weren't 0-prefixed appropriately, it's a bit 
more difficult, but still /reasonably simple.  Taking the *.1 thru *.100 
example (make sure you have wrapping turned off to view this, the below 
assumes you know the wildcards aren't going to match unintended files, 
presumably because you're working in a subdir that only contains the 
pieces you want to assemble into one):

# first combine the single-digit suffix files, creating a
# double-digit-suffix file that orders before any of the existing ones
cat file.mpg.? > file.mpg.01

# next combine the double-digit suffix files
# (including the one created above)
cat file.mpg.?? > file.mpg.001

# OK, now the triple-digit suffixes
cat file.mpg.??? > file.mpg

# Now test, and once you're sure it works, delete all the parts
rm file.mpg.*


Another common type of pre-split file that uses sequential numeric 
extensions is the RAR archive format.  This archive format seems to be 
most common in East Asia and thus from East Asian posters.  It's much 
like zip or the combined tar.* compression AND archive formats in that 
it's a compression as well as archive format, and thus can contain whole 
directories.  But unlike zip and tar.*, the archiver's ability to split 
the archive at preset sizes is often used as well, with these parts then 
posted.

For these files you simply use unrar (gpg but unarchives only, rar itself 
is proprietary) or some other (un)archiver that handles rar files, since 
the split is a native part of the format and the unarchiving process thus 
knows how to reassemble before unarchiving.  IIRC, this format can 
normally be identified by the fact that in addition to the numbered 
files, one file (IIRC the first part, but it's been years...) is simply 
*.rar.

Then there's the various proprietary splitters, some of which append or 
prepend various metadata to each chunk, thus requiring the same software, 
often MS-only, for reassembly,  Fortunately, because these DO require 
specific software for reassembly, they don't tend to be very popular.

Finally, there's the PAR formats.  I actually only worked with PAR-1 
files back in the day (I know PAR-2 was more common later, and for all I 
know there's PAR-3+ now...), and don't remember the specifics too well, 
but the general idea is that these provide additional redundant data for 
error correction.  10-30% of the parts may be missing and the data may 
still be recoverable, provided you have enough PAR files.  IIRC at least 
some of these use *.par.N extensions, with the N being a numeric, of 
course.  You don't need these unless the main post is corrupted or 
partially missing as they're only used for recovery of missing/corrupted 
data, but if you DO need to recover missing/corrupted data, definitely 
get help from someone with more current experience with these than I have.


That should be a reasonable general overview, anyway...  With any luck, 
you'll only have to do the redirected-cat style assembly. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]