help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to find encoding violations in Emacs buffer?


From: Peter Dyballa
Subject: Re: how to find encoding violations in Emacs buffer?
Date: Wed, 13 Dec 2006 11:45:21 +0100


Am 13.12.2006 um 09:39 schrieb riccardo.murri@gmail.com:

Yes, but it may be hard to spot one single problematic character in a
large buffer.  In the case at hand, I had one Latin-1 "ù" in a 20k
UTF-8 text,

This character is an UTF-8 entity:

        [ù]  00F9  LATIN SMALL LETTER U WITH GRAVE

It cannot be the cause. In UTF-8 it's encoded as C3 B9. Kermit and Unicode Emacs 23 have a file UnicodeData.txt that describes a lot of Unicode characters. A bit more complete is Kermit's utf8.txt from which the above excerpt comes.


Isn't there a way to implement a "goto-next-problematic-char" elisp
function?  UTF-8 has a rather simple algorithm to detect encoding
violations, which can point at the precise point where a byte sequence
violates UTF-8 rules, but I wondered if Emacs had a more general
interface: if it knows where in the buffer the encoding violations
are located, one would assume that this information would be available
at elisp level.

There is something like this already implemented in PostScript printing: when the buffer contains characters outside a specific ISO Latin encoding up to a dozen of them is presented in a warning buffer.

--
Greetings

  Pete
              <\
                \__     O                       __O
                | O\   _\\/\-%                _`\<,
                '()-'-(_)--(_)               (_)/(_)






reply via email to

[Prev in Thread] Current Thread [Next in Thread]