--- Begin Message ---
Subject: |
diff shown is not (locally) minimum |
Date: |
Sun, 02 Feb 2014 13:05:45 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131103 Icedove/17.0.10 |
Hi,
diffing millions of lines each day, I stumbled upon something I
considered to be a bug. The attached example files are big, so I depict
the bug on a short example (where the actual bug does not occur with the
real "diff" tool).
The generated diff looks like
A
B
-C
-D
-E
+C
+D
F
G
instead of only
C
D
-E
F
G
If we generate the reverse diff, the bug does not occur, we just have
C
D
+E
F
G
Interestingly, if we remove a line that does not even show up in the
affected section, the problem disappears. (See real example below.)
Using diff -d the problem has not shown up.
I know that the standard diff algorithm does not guarantee minimum
diffs, for example, if sections of a file are moved around.
However, I expected it to guarantee some kind of "local minimum",
that is, *unchanged* lines should not be deleted and inserted
if there are no unchanged lines in between.
This is probably going to be a WONTFIX, but I wanted to report the bug
nonetheless.
To see the bug in action, save the attached files and do:
diff -u old.txt new.txt | grep -A 32 zzz # bug shows up
tail -n +2 old.txt > old2.txt # old2.txt = old1.txt without 1st line
diff -u old2.txt new.txt | grep -A 23 zzz # should-be diff
(The grep extracts the problem section.)
Btw: For these files, the bug also occurs in other diff tools like
"vimdiff" and "git diff" (with the standard diff algorithm) but, for
example, not using KDE's "kompare".
I have had related bigger files where the bug occurs with "diff" and
"vimdiff" only but not "git diff".
It seems to me that the bug is related to some kind of chunk size of the
diff algorithm. However, I did not investigate it further.
Stephan
--
With knowledge grows doubt. -- Goethe
new.txt
Description: Text document
old.txt
Description: Text document
--- End Message ---