gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: arch performance with large trees


From: Miles Bader
Subject: Re: [Gnu-arch-users] Re: arch performance with large trees
Date: Sat, 5 Feb 2005 20:52:01 +0900

On Sat, 05 Feb 2005 09:06:10 +0000, Catalin Marinas
<address@hidden> wrote:
> Ah, I thought I would see something like old_file -> new_file in the
> log. The patch available through BKCVS (and
> http://linux.bkbits.net:8080/linux-2.5/user=miles/address@hidden)
> does a full delete/add of these files so the information would need to
> be retrieved from bkbits.net. Larry McVoy stated that it is OK to
> retrieve some meta information but not the patch itself. You would
> actually need to access the file diff from bkbits.net to get this
> information

Hmmm.... Looking at that file, there seem to be enough hints in the
header comment to accurately guess rename info without looking at
bkbits:

Truly deleted files look like this:

   # BitKeeper/deleted/.del-rte_ma1_cb-ksram.ld~f045845fc65842ff
   #   2002/12/20 22:18:52-08:00 address@hidden +0 -0
   #   Delete: arch/v850/rte_ma1_cb-ksram.ld
   # 
   ...
   diff -Nru a/arch/v850/rte_ma1_cb-ksram.ld b/arch/v850/rte_ma1_cb-ksram.ld
   --- a/arch/v850/rte_ma1_cb-ksram.ld  2005-02-05 03:33:05 -08:00
   +++ /dev/null        Wed Dec 31 16:00:00 196900
   @@ -1,157 +0,0 @@

Truly added files look like this:

   # include/asm-v850/v850e_uarta.h
   #   2003/07/18 10:10:42-07:00 address@hidden +0 -0
   #   BitKeeper file /home/torvalds/v2.5/linux/include/asm-v850/v850e_uarta.h
   # 
   # include/asm-v850/v850e_uarta.h
   #   2003/07/18 10:10:42-07:00 address@hidden +278 -0
   ...
   diff -Nru a/include/asm-v850/v850e_uarta.h b/include/asm-v850/v850e_uarta.h
   --- /dev/null        Wed Dec 31 16:00:00 196900
   +++ b/include/asm-v850/v850e_uarta.h 2005-02-05 01:13:48 -08:00
   @@ -0,0 +1,278 @@

Note there are two header comment entries for the same file, and the
line count in the diff's single hunk matches the count in one of them
(the other is zero).

Renamed files look like this:

   # drivers/serial/v850e_uart.c
   #   2003/07/16 19:04:06-07:00 address@hidden +215 -276
   #   Refactor v850 UART driver
   ...
   diff -Nru a/drivers/serial/v850e_uart.c b/drivers/serial/v850e_uart.c
   --- /dev/null        Wed Dec 31 16:00:00 196900
   +++ b/drivers/serial/v850e_uart.c    2005-02-05 01:13:48 -08:00
   @@ -0,0 +1,549 @@
   ...
   diff -Nru a/drivers/serial/nb85e_uart.c b/drivers/serial/nb85e_uart.c
   --- a/drivers/serial/nb85e_uart.c    2005-02-05 01:13:48 -08:00
   +++ /dev/null        Wed Dec 31 16:00:00 196900
   @@ -1,610 +0,0 @@

That you don't get the old name of the file, but the line count
difference (549 - 610) is the same as the sum of the +/- numbers in the
new file's header comment entry (+251 + -276).

So you can at least distinguish renamed files from added/deleted
files, and know the new name of renamed files; you can dramatically
narrow the set of candidates for the old name of renamed files by
looking at the line counts.  I suspect in practice the only time
you're going to get multiple candidates for the old name of a
particular renamed file is when multiple identical files (e.g.
boilerplate) are be renamed in the same patch; in that case probably a
simple heuristic like "closest path prefix" will do a good job of find
the right one (e.g., that will correctly handle the case where the
same file in multiple architectures is being renamed identically).

-Miles
-- 
Do not taunt Happy Fun Ball.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]