[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: git-merge-changelog question
From: |
Bruno Haible |
Subject: |
Re: git-merge-changelog question |
Date: |
Thu, 2 Jul 2009 23:43:36 +0200 |
User-agent: |
KMail/1.9.9 |
Hi Eric,
> Here's a sequence of steps, comparable to what I tried to use to generate
> 14b8a31ef on the master branch (cherry-picked from branch-1.6). The
> example below uses hashes instead of symbolic names, in case you want to
> reproduce this even when the branches have advanced. Notice that it
> appears to get stuck in an infinite loop, consuming 100% CPU with no
> apparent progress after a while.
>
> $ git clone git://git.sv.gnu.org/m4.git
> $ cd m4
> $ git config
> $ git reset --hard 6033d89
> $ git config merge.merge-changelog.driver
> git-merge-changelog --split-merged-entry %O %A %B
> $ timeout 600 git cherry-pick cd172d9
Indeed, with these instructions, I reproduce.
The guessed "direction", that is, the meaning of the three files, is
right. The two major problems in your case are:
- There are small differences between ChangeLog entries on the
branches. For example, on one branch you have a ChangeLog entry
that starts with
Thu Nov 5 12:37:13 1992 Francois Pinard (pinard at icule)
and on the other branch it starts with
1992-11-05 François Pinard <address@hidden>
These small differences cause the fuzzy string matching to be
activated, and it takes a lot of time. Two exactly equal entries
could be detected by a quick hash table lookup.
- The branches are very far away:
> $ ls -l ChangeLog .merge_file_*
> -rw------- 1 eblake None 172482 Jun 30 21:24 .merge_file_X15QyE
> -rw------- 1 eblake None 172697 Jun 30 21:24 .merge_file_aJFAU3
> -rw------- 1 eblake None 466111 Jun 30 21:24 .merge_file_acCLpE
> -rw-r--r-- 1 eblake None 466111 Jun 30 19:15 ChangeLog
You are merging from a branch that has only 1/3 of the entire development
history. git-merge-changelog spends time looking for each of the other 2/3
of the ChangeLog file whether it can find some match.
The attached change applies Ralf's improvement (originally for gettext's
msgmerge program). With this, the time spent for the "git cherry-pick" command
goes down:
516 sec -> 81 sec
Since this is obviously still not the speed that you desire, I continue to
look...
2009-07-02 Bruno Haible <address@hidden>
Speed up approximate search for matching ChangeLog entries.
* lib/git-merge-changelog.c (entry_fstrcmp): Add a lower_bound
argument. Call fstrcmp_bounded instead of fstrcmp.
(compute_mapping, try_split_merged_entry, main): Update callers.
--- lib/git-merge-changelog.c.orig 2009-07-02 23:32:51.000000000 +0200
+++ lib/git-merge-changelog.c 2009-07-02 22:48:51.000000000 +0200
@@ -211,9 +211,12 @@
/* Perform a fuzzy comparison of two ChangeLog entries.
Return a similarity measure of the two entries, a value between 0 and 1.
- 0 stands for very distinct, 1 for identical. */
+ 0 stands for very distinct, 1 for identical.
+ If the result is < LOWER_BOUND, an arbitrary other value < LOWER_BOUND can
+ be returned. */
static double
-entry_fstrcmp (const struct entry *entry1, const struct entry *entry2)
+entry_fstrcmp (const struct entry *entry1, const struct entry *entry2,
+ double lower_bound)
{
/* fstrcmp works only on NUL terminated strings. */
char *memory;
@@ -233,7 +236,8 @@
p += entry2->length;
*p++ = '\0';
}
- similarity = fstrcmp (memory, memory + entry1->length + 1);
+ similarity =
+ fstrcmp_bounded (memory, memory + entry1->length + 1, lower_bound);
freea (memory);
return similarity;
}
@@ -410,7 +414,8 @@
for (j = n2 - 1; j >= 0; j--)
if (index_mapping_reverse[j] < 0)
{
- double similarity = entry_fstrcmp (entry_i, file2->entries[j]);
+ double similarity =
+ entry_fstrcmp (entry_i, file2->entries[j], best_j_similarity);
if (similarity > best_j_similarity)
{
best_j = j;
@@ -429,7 +434,8 @@
if (index_mapping[ii] < 0)
{
double similarity =
- entry_fstrcmp (file1->entries[ii], entry_j);
+ entry_fstrcmp (file1->entries[ii], entry_j,
+ best_i_similarity);
if (similarity > best_i_similarity)
{
best_i = i;
@@ -729,7 +735,8 @@
new_body.string = new_entry->string + split_offset;
new_body.length = new_entry->length - split_offset;
- similarity = entry_fstrcmp (&old_body, &new_body);
+ similarity =
+ entry_fstrcmp (&old_body, &new_body, best_similarity);
if (similarity > best_similarity)
{
best_split_offset = split_offset;
@@ -1219,7 +1226,8 @@
size_t i;
for (i = edit->i1 + 1; i <= edit->i2; i++)
if (entry_fstrcmp (ancestor_file.entries[i],
- modified_file.entries[i +
edit->j2 - edit->i2])
+ modified_file.entries[i +
edit->j2 - edit->i2],
+ FSTRCMP_THRESHOLD)
< FSTRCMP_THRESHOLD)
{
simple_merged = false;
@@ -1300,7 +1308,8 @@
simple = true;
for (i = edit->i1; i <= edit->i2; i++)
if (entry_fstrcmp (ancestor_file.entries[i],
- modified_file.entries[i + edit->j2
- edit->i2])
+ modified_file.entries[i + edit->j2
- edit->i2],
+ FSTRCMP_THRESHOLD)
< FSTRCMP_THRESHOLD)
{
simple = false;
- Re: git-merge-changelog question,
Bruno Haible <=