wdiff-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[wdiff-dev] [patch #7121] New, per-character diff, mode


From: Georgios Zarkadas
Subject: [wdiff-dev] [patch #7121] New, per-character diff, mode
Date: Mon, 29 Mar 2010 23:04:41 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; el-GR; rv:1.9.1.8) Gecko/20100214 Ubuntu/9.10 (karmic) Firefox/3.5.8

Follow-up Comment #3, patch #7121 (project wdiff):

Hi,
My answers to the remarks follow, in the same order.

-1- Yes, to both questions.

-2- Ok, it fits with (-3-)'s time frame; we could also keep just the long
option to avoid any ambiguity.

-3- Yes, it is; it was a quick hack in order to drive development of a tool
for the trans-coord project, which uses wdiff for showing changes in fuzzy
translations (see
http://lists.gnu.org/archive/html/trans-coord-devel/2010-03/msg00014.html).
   However, it is fast, resets after an error to normal and stream
preserving. If, as I chose, one wants to keep all bytes of the stream,
inevitably on an input error it will spit something non-printable; but it is
not wdiff's responsibility to validate the stream IMHO.
   This is not my final word, however; in order to arrive in a more general
solution I have started studying other encodings, such as UTF-16, the unicode
routines available by glibc and also I had a quick look in Coreutils'
sources.
   I believe that the following apply:
   --- Calling `setlocale(LC_CTYPE, NULL)' at program's startup to get the
default encoding (plus a commandline option to supply it explicitly) and
branching to either single-byte or the appropriate multi-byte mode will be
easy.
   --- Splitting of words/chars will need some thought in order not to loose
too much in speed and handle errors gracefully, but it is rather
straightforward; you just follow the selected encoding's rules. Most probably
it will require to change all getc/putc calls to the apropriate multi-byte
versions.
   --- Doing more elaborate things such as trully supporting the -i,
--ignore-case option in all encodings will most probably require quite a lot
of code (actually, I do not know yet how much).
   Thus, a pass-through implementation (just to break words/chars right in
any encoding) is feasible IMO in a few months; I will try it. If you are aware
of other GNU software that handles unicode point me to it; there may be
suitable ready code to use for this purpose.

-4- It is better to postpone any such activity at this moment (cf. -3-)

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?7121>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]