bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: large file support in diff utils


From: Chuck Swiger
Subject: Re: large file support in diff utils
Date: Mon, 11 Apr 2005 13:07:48 -0400

On Apr 11, 2005, at 12:12 PM, jiesheng zhang wrote:
Does anyone know the gnu diff/patch utils support large file (file >2G) or not? Does the default "./confifure, make " support it, or do I need special compile procedure?

If the platform you are on supports large files (off_t is not a 32-bit value), in theory GNU diff and Larry Wall's patch will work. In practice, you might have some hope if you have a 64-bit platform to run diff on with several GB of RAM, otherwise no way. diff requires many times more memory than the size of the files it is working on, even when using the -H option.

There is a program I know of intended to handle large files more efficiently than GNU diff in some circumstances:

"This is a pair of tools for building (bsdiff) and applying (bspatch)
binary patches.  When applied to two versions of the same executable
the patches produced are significantly smaller than those generated
by other binary diff tools (eg, xdelta).

WWW: http://www.daemonology.net/bsdiff/

- Colin Percival
address@hidden"

--
-Chuck

PS: From that site:

"bsdiff is quite memory-hungry. It requires max(17*n,9*n+m)+O(1) bytes of memory, where n is the size of the old file and m is the size of the new file. bspatch requires n+m+O(1) bytes.

bsdiff runs in O((n+m) log n) time; on a 200MHz Pentium Pro, building a binary patch for a 4MB file takes about 90 seconds. bspatch runs in O(n+m) time; on the same machine, applying that patch takes about two seconds.

Providing that off_t is defined properly, bsdiff and bspatch support files of up to 2^61-1 = 2Ei-1 bytes."

I mention this because I'd be curious to know what the memory and running time (big-O) of diff is. My experience suggests that you can't run diff against anything larger than a few 10's of MB before the program's memory usage exceeds MAXDSIZE on a 32-bit platform.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]