emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: decode-coding-string gone awry?


From: David Kastrup
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 03:28:32 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

Kenichi Handa <address@hidden> writes:

> In article <address@hidden>, David Kastrup <address@hidden> writes:
>> I have the problem that within preview-latex there is a function
>> that assembles UTF-8 strings from single characters.  This
>> function, when used manually, mostly works.
>
> It seems that you are caught in a trap of automatic
> unibyte->multibyte conversion.
>
>> (defun preview-error-quote (string)
>>   "Turn STRING with potential ^^ sequences into a regexp.
>> To preserve sanity, additional ^ prefixes are matched literally,
>> so the character represented by ^^^ preceding extended characters
>> will not get matched, usually."
>>   (let (output case-fold-search)
>>     (while (string-match 
>> "\\^\\{2,\\}\\(\\(address@hidden)\\|[8-9a-f][0-9a-f]\\)"
>>                       string)
>>       (setq output
>>          (concat output
>>                  (regexp-quote (substring string
>>                                           0
>>                                           (- (match-beginning 1) 2)))
>
> If STRING is taken from a multibyte buffer, it is a
> multibyte string.  Thus, the above substring also returns a
> multibyte string.
>
>>                    (char-to-string
>>                     (string-to-number (match-string 1 string) 16))))
>
> But, this char-to-string produces a unibyte string.  So, on
> concatinating them, this unibyte string is automatically converted
> to multibyte by string-make-multibyte function which usually
> produces a multibyte string containing latin-1 chars.

Oh.  Latin-1 chars.  Can't I tell char-to-string to produce the same
sort of raw-marked chars that raw-text (as process-coding system)
appears to produce?

>>   (setq output (decode-coding-string output buffer-file-coding-system))
>
> And this decode-coding-string treats the internal byte
> sequence of a multibyte string OUTPUT as utf-8, thus you get
> some garbage.
>
>> Unfortunately, when I call this stuff by hand instead from the
>> process-sentinel, it mostly works
>
> That is because the string you give to preview-error-quote
> is a unibyte string in that case.  The Lisp reader generates
> a unibyte string when it sees ASCII-only string.
>
> Ex: (multibyte-string-p "abc") => nil
>
> This will also return incorrect string.
>
> (preview-error-quote
>   (string-to-multibyte "r Weise $f$ um~$1$ erh^^c3^^b6ht und $e$"))
>
> So, the easiest fix will be to do:
>   (setq string (string-as-unibyte string))
> in the head of preview-error-quote.

Sigh.  XEmacs-21.4-mule does not seem to have string-as-unibyte.  I'll
have to see whether it happens to work without it on XEmacs.  If not,
I'll have to come up with something else.

Thanks for the analysis!

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum




reply via email to

[Prev in Thread] Current Thread [Next in Thread]