mldonkey-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Mldonkey-users] similar chunks detection


From: Sergio Bayarri Gausi
Subject: Re: [Mldonkey-users] similar chunks detection
Date: Tue, 4 Mar 2003 00:21:28 +0100 (MET)

Hello,

> >IMHO that's a great idea - i'm looking forward to it.
>
> I can't see any files having the same ~10MB in common, to be honest.
> Has anyone actually worked out how often this occurs, if at all?

Quite frequently, I think.

A lot of times, when I use jigle, I notice there are several files with
the same size (or slightly bigger/smaller) but with different md4s.

One can think it could be that people downloaded the file, but got it
corrupted somehow, and they are sharing that corrupted file instead of the
"original" one. If people tend to download the "wrong" file ("hey, this
one is bigger, it must be the good one", "hey, this one is smaller, I'll
get it in less time"), you'll finish having two versions of the "same"
file, each one with a different availability.

Of course this is very bad for archives (rar or zip files for example),
since they will be corrupted. But in large mpeg/avi/divx files, you might
not notice the difference.

Usually, both files differ very little, and they will share a lot of
common chunks.

If you use to dowload both versions (just in case) of a file, because you
don't really know which is the good one (however, availability could help
to distinguish them), then you'll save a lot of time bandwith because
you'll have to download a lot of chunks only once (as they will be copied
to the other file and forth).

For example, if you search for a file in jigle, and you found 3 versions,
all of them of 390.18Mb exactly, but with different md4s and availability,
and you decide to download all of them (because you can't discern which is
the good one), you'll finish downloading perhaps a bit more than 390.18Mb
(390.18Mb of one file, plus the differences with the second file, plus the
differences with the third file), instead of downloading 390.18Mb three
times.

Another scenario: You downloaded a 600Mb zip file (which had several
version in the edonkey network) and (uh-oh) once finished, you notice that
it's a corrupted file (sometimes a bit can make the difference!). So you
decide to download another version (with another md4 but same size) from
the edonkey network. Without "similar chunks detection" you'll have to
download 600Mb again. With it, the identical chunks are copied to the new
download, and you'll have to download ONLY the parts that were bad on the
first download, saving a lot of time and bandwith.

I hope this makes sense :)

Greetings,

Sergio





reply via email to

[Prev in Thread] Current Thread [Next in Thread]