emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: decode-coding-string gone awry?


From: David Kastrup
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 14:50:09 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

Stefan Monnier <address@hidden> writes:

>>     (while (string-match 
>> "\\^\\{2,\\}\\(\\(address@hidden)\\|[8-9a-f][0-9a-f]\\)"
>>                        string)
>>       (setq output
>>           (concat output
>>                   (regexp-quote (substring string
>>                                            0
>>                                            (- (match-beginning 1) 2)))
>>                   (if (match-beginning 2)
>>                       (concat
>>                        "\\(?:" (regexp-quote
>>                                 (substring string
>>                                            (- (match-beginning 1) 2)
>>                                            (match-end 0)))
>>                        "\\|"
>>                        (char-to-string
>>                         (logxor (aref string (match-beginning 2)) 64))
>>                        "\\)")
>>                     (char-to-string
>>                      (string-to-number (match-string 1 string) 16))))
>>           string (substring string (match-end 0))))
>>     (setq output (concat output (regexp-quote string)))
>>     (if (featurep 'mule)
>>       (prog2
>>           (message "%S %S " output buffer-file-coding-system)
>>           (setq output (decode-coding-string output 
>> buffer-file-coding-system))
>>         (message "%S\n" output))
>>       output)))
>
> The problem is that by passing `output' to decode-coding-string you
> clearly consider `output' to be a sequence of bytes.  But to
> construct `output' you use pieces of `string' so you have to make
> sure that `string' is also a sequence of bytes.  Assuming `string'
> comes from the TeX process, you can do that by making sure that that
> process's output coding system is `binary' (or `raw-text' if you
> want EOL-conversion).

I already mentioned that this _is_ exactly what we do already: the
problem is that some TeX systems are set up to quote _some_ bytes from
utf-8 in the ^^xx hexadecimal notation, and let some bytes through
unchanged.  It is completely braindead.  The funny thing is that with
the _mixed_ representation, the hard case, this code worked.  But with
the _complete_ ASCII transcription, it doesn't.  I have to experiment
a bit with things like string-as-multibyte and stuff to find out what
combination will be right all of the time.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum




reply via email to

[Prev in Thread] Current Thread [Next in Thread]