[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: decode-coding-string gone awry?
From: |
David Kastrup |
Subject: |
Re: decode-coding-string gone awry? |
Date: |
Mon, 14 Feb 2005 03:28:32 +0100 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) |
Kenichi Handa <address@hidden> writes:
> In article <address@hidden>, David Kastrup <address@hidden> writes:
>> I have the problem that within preview-latex there is a function
>> that assembles UTF-8 strings from single characters. This
>> function, when used manually, mostly works.
>
> It seems that you are caught in a trap of automatic
> unibyte->multibyte conversion.
>
>> (defun preview-error-quote (string)
>> "Turn STRING with potential ^^ sequences into a regexp.
>> To preserve sanity, additional ^ prefixes are matched literally,
>> so the character represented by ^^^ preceding extended characters
>> will not get matched, usually."
>> (let (output case-fold-search)
>> (while (string-match
>> "\\^\\{2,\\}\\(\\(address@hidden)\\|[8-9a-f][0-9a-f]\\)"
>> string)
>> (setq output
>> (concat output
>> (regexp-quote (substring string
>> 0
>> (- (match-beginning 1) 2)))
>
> If STRING is taken from a multibyte buffer, it is a
> multibyte string. Thus, the above substring also returns a
> multibyte string.
>
>> (char-to-string
>> (string-to-number (match-string 1 string) 16))))
>
> But, this char-to-string produces a unibyte string. So, on
> concatinating them, this unibyte string is automatically converted
> to multibyte by string-make-multibyte function which usually
> produces a multibyte string containing latin-1 chars.
Oh. Latin-1 chars. Can't I tell char-to-string to produce the same
sort of raw-marked chars that raw-text (as process-coding system)
appears to produce?
>> (setq output (decode-coding-string output buffer-file-coding-system))
>
> And this decode-coding-string treats the internal byte
> sequence of a multibyte string OUTPUT as utf-8, thus you get
> some garbage.
>
>> Unfortunately, when I call this stuff by hand instead from the
>> process-sentinel, it mostly works
>
> That is because the string you give to preview-error-quote
> is a unibyte string in that case. The Lisp reader generates
> a unibyte string when it sees ASCII-only string.
>
> Ex: (multibyte-string-p "abc") => nil
>
> This will also return incorrect string.
>
> (preview-error-quote
> (string-to-multibyte "r Weise $f$ um~$1$ erh^^c3^^b6ht und $e$"))
>
> So, the easiest fix will be to do:
> (setq string (string-as-unibyte string))
> in the head of preview-error-quote.
Sigh. XEmacs-21.4-mule does not seem to have string-as-unibyte. I'll
have to see whether it happens to work without it on XEmacs. If not,
I'll have to come up with something else.
Thanks for the analysis!
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
- decode-coding-string gone awry?, David Kastrup, 2005/02/12
- Re: decode-coding-string gone awry?, Kenichi Handa, 2005/02/13
- Re: decode-coding-string gone awry?,
David Kastrup <=
- Re: decode-coding-string gone awry?, Richard Stallman, 2005/02/15
- Re: decode-coding-string gone awry?, David Kastrup, 2005/02/15
- Re: decode-coding-string gone awry?, Stefan Monnier, 2005/02/15
- Re: decode-coding-string gone awry?, Richard Stallman, 2005/02/17
- Re: decode-coding-string gone awry?, Kenichi Handa, 2005/02/17
- Re: decode-coding-string gone awry?, Stefan Monnier, 2005/02/17
- Re: decode-coding-string gone awry?, Kenichi Handa, 2005/02/18
- Re: decode-coding-string gone awry?, Stefan Monnier, 2005/02/18
- Re: decode-coding-string gone awry?, Richard Stallman, 2005/02/19
- Re: decode-coding-string gone awry?, Richard Stallman, 2005/02/18