[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Diffutils-devel] Interest in a -B / --binary option to cmp
From: |
Richard Bass |
Subject: |
[Diffutils-devel] Interest in a -B / --binary option to cmp |
Date: |
Fri, 31 May 2019 10:23:36 -0700 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 |
Hello! I am new to the diffutils development list. I am curious to
see whether there would be support for a new option to the cmp
command. Specifically I am proposing a --binary / -B option which
would indicate that the user is only interested in a binary comparison,
and NOT to perform line analysis.
The problem, as I see it, is that if you want to compare a huge volume
of data to determine whether there has been any corruption, you are
only interested in whether the files are the same or different and not
the line number where they differ. You could just use cmp -s to do
this but then you don't get the answer in stdout. With cmp -s, you
have to check the exit status and so wrap each of the the cmp commands
with an extra test.
I added a -B option to a local copy of cmp built from the latest
diffutils source. I called it rcmp. For two files that are 3.2 GB
each and that are identical, after making sure that both files were
cached, I got the following:
$ time rcmp f1 f2
real 0m4.686s
user 0m1.560s
sys 0m3.026s
$ time rcmp -B f1 f2
real 0m3.810s
user 0m0.701s
sys 0m3.057s
These numbers were consistent. In other words, the line number
analysis ended up costing an additional 23%. For a large volume of
data, this is considerable. Note that when files do differ, the output
merely omits the line number:
$ rcmp f3 f4
f3 f4 differ: byte 16, line 1
$ rcmp -B f3 f4
f3 f4 differ: byte 16
Of course, I could be satisfied with running my own version, but I
figured that others might be interested in such an option.
Thanks,
Richard <rwb>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Diffutils-devel] Interest in a -B / --binary option to cmp,
Richard Bass <=