bug-diffutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-diffutils] --speed-large-files


From: Bruno Haible
Subject: Re: [bug-diffutils] --speed-large-files
Date: Sun, 08 Jan 2012 21:52:30 +0100
User-agent: KMail/4.7.4 (Linux/3.1.0-1.2-desktop; KDE/4.7.4; x86_64; ; )

Paul Eggert wrote:
> > --speed-large-files is essential for users who need to reduce
> > the running time from O(N²) to O(N).
> 
> The running time is O(ND) where D is the number of differences.
> One can come up with examples exhibiting O(N**2) behavior, but
> they're rare enough in practice that it doesn't seem to be a problem.

I encountered such cases a couple of years ago. IIRC, the files were
two mbox files that had diverged through different modifications on
different machines.

> As far as I know, --speed-large-files has never been implemented in
> GNU diff (as -h was in the original Unix diff); it has always been
> a no-op.

When you pass --speed-large-files, src/diff.c sets speed_large_files
to true. Then, src/analyze.c sets ctxt.heuristic to true. Then,
gnulib/lib/diffseq.h function diag() applies your heuristic.

It cannot be a no-op, as it has an effect on the run-time: In a contrived
example (two mbox files, each ca. 18 MB large),
  $ time diff -u mbox1 mbox2 > /dev/null
displays a run time of 3.81 seconds, whereas
  $ time diff -u --speed-large-files mbox1 mbox2 > /dev/null
displays a run time of 3.98 seconds.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]