[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] [PATCH] and RFC: binary files merging and hook
From: |
rghetta |
Subject: |
[Monotone-devel] [PATCH] and RFC: binary files merging and hook |
Date: |
Wed, 25 May 2005 00:33:04 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050322 |
Monotone doesn't distinguish beetween binary and text files (it hasn't
even the *notion* of binary and text).
Therefore its merging algorithm is applied even to files containing
binary data, like images or tarballs, leading to silent corruption if
the files contain strategically placed newlines (see test
t_merge_binary.at for an example).
The attached patch inhibits automatic merging on binary files and adds a
new hook "binary_file(filepath)" to help monotone categorize them.
The hook receives the filespec and returns true,false or nil as follows:
true means the file must be treated as binary
false means the file must be treated as text
nil means "guess from content".
If the hook returns nil, the file will be treated as binary if the
monotone function guess_binary() returns true, i.e. if the files
contains NUL bytes or a selection of other ASCII control chars (for
example, STX and ETX).
Right now "binary" means only "don't apply monotone automatic merging".
This means that a merge of two binary files will always call the
merge2() or merge3() hooks, invoking the external merge tool.
You could, for example, invoke gimp when merging images, and so on.
This patch intentionally doesn't try to update .mt-attrs or modify the
attr_functions/attr_init_functions lua tables. It's purpose is fixing a
bug and hopefully spawn some discussion about binary file support.
The current hook definition is:
function binary_file(name)
lowname=string.lower(name)
-- some known binaries, return true
if (string.find(lowname, "%.gif$")) then return true end
if (string.find(lowname, "%.jpe?g$")) then return true end
if (string.find(lowname, "%.png$")) then return true end
if (string.find(lowname, "%.bz2$")) then return true end
if (string.find(lowname, "%.gz$")) then return true end
if (string.find(lowname, "%.zip$")) then return true end
-- some known text, return false
if (string.find(lowname, "%.cc?$")) then return false end
if (string.find(lowname, "%.cxx$")) then return false end
if (string.find(lowname, "%.hh?$")) then return false end
if (string.find(lowname, "%.hxx$")) then return false end
if (string.find(lowname, "%.lua$")) then return false end
if (string.find(lowname, "%.texi$")) then return false end
if (string.find(lowname, "%.sql$")) then return false end
-- unknown - return nil
return nil;
end
Comments needed,
Riccardo
#
# patch "diff_patch.cc"
# from [209e60efcca30474071383769840ae18d3d1aeef]
# to [b0cc050e76c94436ad35ebc9a9b3474b9adc58c9]
#
# patch "diff_patch.hh"
# from [8045a7d688a9494585eab247815ef9573687039f]
# to [608d32049b8a153d6c1d1f8df6bc1648e3d96597]
#
# patch "file_io.cc"
# from [35b0d27bb9a0a1c72b651fd2081a0f9a6d6c58f1]
# to [87e9aee7193f2da70e9fac926bb9b83f652f9892]
#
# patch "file_io.hh"
# from [d2865f7695f667b872b858b276edcec5d7cf1605]
# to [ea8749455633457ef1843c2f63ef16980289efbf]
#
# patch "lua.cc"
# from [76cfc26a5a3d0c64a9f244d78ad51c8975aa8d52]
# to [8c1adf0d74734aaa182ea0436abf630530b293ab]
#
# patch "lua.hh"
# from [a1b3877535049b77b678d864354d030f8433854c]
# to [b13faf78c7f0704ec711d2f0945e86b49d5d1c10]
#
# patch "monotone.texi"
# from [ffbdfa684b8b1e995bfbfb51c1851408024e3647]
# to [1a0beb99ad5db9285034e5ffd8f9d729da32c3c8]
#
# patch "std_hooks.lua"
# from [05e5e9b2b7f37107a7f38161d258e582eb13d697]
# to [52f8fc63ce87bb92dc1050005c6557ab243b98b9]
#
# patch "tests/t_merge_binary.at"
# from [4cdc96fa85206b0b6d7488886045f1d541a55131]
# to [91f0212ff5acbaba6a6d9529b134e9dc63ed9bf7]
#
--- diff_patch.cc
+++ diff_patch.cc
@@ -21,18 +21,6 @@
using namespace std;
-bool guess_binary(string const & s)
-{
- // these do not occur in ASCII text files
- // FIXME: this heuristic is (a) crap and (b) hardcoded. fix both these.
- if (s.find_first_of('\x00') != string::npos ||
- s.find_first_of("\x01\x02\x03\x04\x05\x06\x0e\x0f"
- "\x10\x11\x12\x13\x14\x15\x16\x17\x18"
- "\x19\x1a\x1c\x1d\x1e\x1f") != string::npos)
- return true;
- return false;
-}
-
//
// a 3-way merge works like this:
//
@@ -537,45 +525,51 @@
file_data left_data, right_data, ancestor_data;
data left_unpacked, ancestor_unpacked, right_unpacked, merged_unpacked;
- string left_encoding, anc_encoding, right_encoding;
- vector<string> left_lines, ancestor_lines, right_lines, merged_lines;
this->get_version(left_path, left_id, left_data);
this->get_version(anc_path, ancestor_id, ancestor_data);
this->get_version(right_path, right_id, right_data);
- left_encoding = this->get_file_encoding(left_path, left_man);
- anc_encoding = this->get_file_encoding(anc_path, anc_man);
- right_encoding = this->get_file_encoding(right_path, right_man);
-
left_unpacked = left_data.inner();
ancestor_unpacked = ancestor_data.inner();
right_unpacked = right_data.inner();
- split_into_lines(left_unpacked(), left_encoding, left_lines);
- split_into_lines(ancestor_unpacked(), anc_encoding, ancestor_lines);
- split_into_lines(right_unpacked(), right_encoding, right_lines);
-
- if (merge3(ancestor_lines,
- left_lines,
- right_lines,
- merged_lines))
+ if (file_mergeable(left_path, left_unpacked, app.lua) &&
+ file_mergeable(right_path, right_unpacked, app.lua) &&
+ file_mergeable(anc_path, ancestor_unpacked, app.lua))
{
- hexenc<id> tmp_id;
- file_data merge_data;
- string tmp;
-
- L(F("internal 3-way merged ok\n"));
- join_lines(merged_lines, tmp);
- calculate_ident(data(tmp), tmp_id);
- file_id merged_fid(tmp_id);
- merge_data = file_data(tmp);
-
- merged_id = merged_fid;
- record_merge(left_id, right_id, merged_fid,
- left_data, merge_data);
-
- return true;
+ // all files mergeable by monotone internal algorithm, try to merge
+ string left_encoding, anc_encoding, right_encoding;
+ left_encoding = this->get_file_encoding(left_path, left_man);
+ anc_encoding = this->get_file_encoding(anc_path, anc_man);
+ right_encoding = this->get_file_encoding(right_path, right_man);
+
+ vector<string> left_lines, ancestor_lines, right_lines, merged_lines;
+ split_into_lines(left_unpacked(), left_encoding, left_lines);
+ split_into_lines(ancestor_unpacked(), anc_encoding, ancestor_lines);
+ split_into_lines(right_unpacked(), right_encoding, right_lines);
+
+ if (merge3(ancestor_lines,
+ left_lines,
+ right_lines,
+ merged_lines))
+ {
+ hexenc<id> tmp_id;
+ file_data merge_data;
+ string tmp;
+
+ L(F("internal 3-way merged ok\n"));
+ join_lines(merged_lines, tmp);
+ calculate_ident(data(tmp), tmp_id);
+ file_id merged_fid(tmp_id);
+ merge_data = file_data(tmp);
+
+ merged_id = merged_fid;
+ record_merge(left_id, right_id, merged_fid,
+ left_data, merge_data);
+
+ return true;
+ }
}
P(F("help required for 3-way merge\n"));
@@ -845,7 +839,7 @@
if (b_len == 0)
ost << " +0,0";
else
- {
+ {
ost << " +" << b_begin+1;
if (b_len > 1)
ost << "," << b_len;
--- diff_patch.hh
+++ diff_patch.hh
@@ -19,7 +19,6 @@
// this file is to contain some stripped down, in-process implementations
// of GNU-diffutils-like things (diff, diff3, maybe patch..)
-bool guess_binary(std::string const & s);
enum diff_type
{
--- file_io.cc
+++ file_io.cc
@@ -254,6 +254,32 @@
return fs::exists(localized(p));
}
+bool guess_binary(string const & s)
+{
+ // these do not occur in ASCII text files
+ // FIXME: this heuristic is (a) crap and (b) hardcoded. fix both these.
+ if (s.find_first_of('\x00') != string::npos ||
+ s.find_first_of("\x01\x02\x03\x04\x05\x06\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17\x18"
+ "\x19\x1a\x1c\x1d\x1e\x1f") != string::npos)
+ return true;
+ return false;
+}
+
+// returns true if the file can be merged by monotone built in merger
+bool file_mergeable(file_path const & path, data const &dt, lua_hooks & lua)
+{
+ bool is_binary;
+ if (lua.hook_binary_file(path, is_binary))
+ return !is_binary; // the hook returned a valid response
+ else
+ {
+ // the hook returned "unknown". If the guess_binary() returns true,
+ // the file is assumed to be binary, thus not mergeable
+ return !guess_binary(dt());
+ }
+}
+
void
delete_file(local_path const & p)
{
--- file_io.hh
+++ file_io.hh
@@ -59,6 +59,12 @@
bool file_exists(local_path const & path);
bool file_exists(file_path const & path);
+// returns true if the string content is binary according to monotone euristic
+bool guess_binary(std::string const & s);
+
+// returns true if the file can be merged by monotone built in merger
+bool file_mergeable(file_path const & path, data const &dt, lua_hooks & lua);
+
void mkdir_p(local_path const & path);
void mkdir_p(file_path const & path);
void make_dir_for(file_path const & p);
--- lua.cc
+++ lua.cc
@@ -745,7 +745,22 @@
return exec_ok && ignore_it;
}
+// returns false if the hook returned nil or something not boolean
+// otherwise returns true and is_binary contains the return value from the hook
bool
+lua_hooks::hook_binary_file(file_path const & p, bool &is_binary)
+{
+ is_binary = false;
+ bool exec_ok = Lua(st)
+ .func("binary_file")
+ .push_str(p())
+ .call(1,1)
+ .extract_bool(is_binary)
+ .ok();
+ return exec_ok;
+}
+
+bool
lua_hooks::hook_non_blocking_rng_ok()
{
bool ok = false;
--- lua.hh
+++ lua.hh
@@ -71,6 +71,13 @@
// local repo hooks
bool hook_ignore_file(file_path const & p);
bool hook_ignore_branch(std::string const & branch);
+
+ // if the file type is known, the return value is true and the is_binary
+ // parameter will contain true if the file is binary, false if is text.
+ // if the hook value is false, then the file type is not known and must
+ // be resolved otherwise
+ bool hook_binary_file(file_path const & p, bool &is_binary);
+
bool hook_merge2(file_path const & left_path,
file_path const & right_path,
file_path const & merged_path,
--- monotone.texi
+++ monotone.texi
@@ -5404,6 +5404,38 @@
branches. Otherwise returns @code{false}. This hook has no default
definition, therefore the default behavior is to list all branches.
address@hidden binary_file (@var{filename})
+
+Returns @code{true} if @var{filename} is binary, @code{false} is the
address@hidden is certainly text and @code{nil} if the file type is
+unknown and should be @emph{guessed} by monotone.
+The default definition of this hook is:
+
address@hidden
address@hidden
+function binary_file(name)
+ lowname=string.lower(name)
+ -- some known binaries, return true
+ if (string.find(lowname, "%.gif$")) then return true end
+ if (string.find(lowname, "%.jpe?g$")) then return true end
+ if (string.find(lowname, "%.gif$")) then return true end
+ if (string.find(lowname, "%.bz2$")) then return true end
+ if (string.find(lowname, "%.gz$")) then return true end
+ if (string.find(lowname, "%.zip$")) then return true end
+ -- some known text, return false
+ if (string.find(lowname, "%.cc?$")) then return false end
+ if (string.find(lowname, "%.cxx$")) then return false end
+ if (string.find(lowname, "%.hh?$")) then return false end
+ if (string.find(lowname, "%.hxx$")) then return false end
+ if (string.find(lowname, "%.lua$")) then return false end
+ if (string.find(lowname, "%.texi$")) then return false end
+ if (string.find(lowname, "%.sql$")) then return false end
+ -- unknown - return nil
+ return nil;
+end
address@hidden group
address@hidden smallexample
+
@item get_revision_cert_trust (@var{signers}, @var{id}, @var{name}, @var{val})
Returns whether or not you @emph{trust} the assertion
--- std_hooks.lua
+++ std_hooks.lua
@@ -91,6 +91,28 @@
return false;
end
+-- return true means "binary", false means "text",
+-- nil means "unknown, try to guess"
+function binary_file(name)
+ lowname=string.lower(name)
+ -- some known binaries, return true
+ if (string.find(lowname, "%.gif$")) then return true end
+ if (string.find(lowname, "%.jpe?g$")) then return true end
+ if (string.find(lowname, "%.png$")) then return true end
+ if (string.find(lowname, "%.bz2$")) then return true end
+ if (string.find(lowname, "%.gz$")) then return true end
+ if (string.find(lowname, "%.zip$")) then return true end
+ -- some known text, return false
+ if (string.find(lowname, "%.cc?$")) then return false end
+ if (string.find(lowname, "%.cxx$")) then return false end
+ if (string.find(lowname, "%.hh?$")) then return false end
+ if (string.find(lowname, "%.hxx$")) then return false end
+ if (string.find(lowname, "%.lua$")) then return false end
+ if (string.find(lowname, "%.texi$")) then return false end
+ if (string.find(lowname, "%.sql$")) then return false end
+ -- unknown - return nil
+ return nil
+end
function edit_comment(basetext, user_log_message)
local exe = "vi"
--- tests/t_merge_binary.at
+++ tests/t_merge_binary.at
@@ -1,11 +1,11 @@
AT_SETUP([merge binary file])
MONOTONE_SETUP
NEED_UNB64
-# This is a real merge error. A binary file is happily merged by monotone
+# This was a real merge error. A binary file happily merged by monotone
# just because contains some strategically placed line feeds
-AT_XFAIL_IF(true)
+# now is a test for the new hook binary_file() and its effect on merging
AT_DATA(parent.bmp.b64,
[Qk1mdQAAAAAAADYAAAAoAAAAZAAAAGQAAAABABgAAAAAADB1AADrCgAA6woAAAAAAAAAAAAAQJ9C
QJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9C
@@ -221,6 +221,18 @@
UNB64(left.bmp.b64, left.bmp)
UNB64(right.bmp.b64, right.bmp)
+# hook forces all files binary
+AT_DATA(binary.lua, [function binary_file(name) return true end
+])
+
+# hook forces all files text
+AT_DATA(text.lua, [function binary_file(name) return false end
+])
+
+# hook make all files guess
+AT_DATA(guess.lua, [function binary_file(name) return nil end
+])
+
AT_CHECK(cp -f parent.bmp testfile.bmp)
AT_CHECK(MONOTONE add testfile.bmp, [], [ignore], [ignore])
COMMIT(testbranch)
@@ -234,7 +246,13 @@
AT_CHECK(cp -f right.bmp testfile.bmp)
COMMIT(testbranch)
-# merge should fail!
-AT_CHECK(MONOTONE --branch=testbranch merge, [1], [ignore], [ignore])
+# file marked binary: merge should fail
+AT_CHECK(MONOTONE --rcfile=binary.lua --branch=testbranch merge, [1],
[ignore], [ignore])
+# file marked "guess": merge should fail
+AT_CHECK(MONOTONE --rcfile=guess.lua --branch=testbranch merge, [1], [ignore],
[ignore])
+
+# finally, file marked text: merge should work!
+AT_CHECK(MONOTONE --rcfile=text.lua --branch=testbranch merge, [0], [ignore],
[ignore])
+
AT_CLEANUP
- [Monotone-devel] [PATCH] and RFC: binary files merging and hook,
rghetta <=