emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: converting between charsets


From: Alexander Kotelnikov
Subject: Re: converting between charsets
Date: Tue, 16 May 2006 00:30:44 +0400
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

>>>>> On Mon, 15 May 2006 10:11:48 -0400
>>>>> "SM" == Stefan Monnier <address@hidden> wrote:
SM> 
SM> Than show us how and when you call encode-coding-region.
SM> I.e. repeat the above but in elisp rather than english.
SM> Assume you're explining it to a complete idiot.
SM> 
SM> Please take seriously the bit about the idiot.
SM> 
>>>> For example.
>>>> 1. (find-file "/tmp/test.txt")
SM> 
SM> How did you start Emacs?
SM> 
>> For example 'emacs -q', even if do not suppress reading my ~/.emacs
>> the result is the same.
SM> 
SM> Under X or under a tty?

X

>>>> 2. enter some text in Russian (after I toggled xkb layout)
>>>> 3. M-: (encode-coding-region (point-min) (point-max) 'koi8-r) and
>>>> Russian characters become '?'.
SM> What did you expect instead?
>> I expect that cyrrillic characters will be encoded to their koi8-r values.
SM> 
SM> If you put the cursor on the russian chars before calling
SM> encode-coding-region and hit C-u C-x = what does it say?

  character: Т (01212102, 332866, 0x51442)
    charset: mule-unicode-0100-24ff
             (Unicode characters of the range U+0100..U+24FF.)
 code point: 40 66
     syntax: word
   category: y:Cyrillic  
buffer code: 0x9C 0xF4 0xA8 0xC2
  file code: 0xD0 0xA2 (encoded by coding system utf-8)
       font: -monotype-courier new-medium-r-normal--13-94-99-99-m-80-iso10646-1

SM> If you put the cursor on the `?' that replaced that char and hit C-u C-x =
SM> what does it say?

  character: ? (077, 63, 0x3f)
    charset: ascii (ASCII (ISO646 IRV))
 code point: 63
     syntax: punctuation
   category: a:ASCII   l:Latin  
buffer code: 0x3F
  file code: 0x3F (encoded by coding system utf-8)
       font: -monotype-courier 
new-medium-r-normal--13-94-99-99-m-80-adobe-standard

And for français I get:
  character: ç (04347, 2279, 0x8e7)
    charset: latin-iso8859-1
             (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100)
 code point: 103
     syntax: word
   category: l:Latin  
buffer code: 0x81 0xE7
  file code: 0xC3 0xA7 (encoded by coding system utf-8)
       font: -monotype-courier new-medium-r-normal--13-94-99-99-m-80-iso8859-1

after (representaion is \347)
  character: ç (0347, 231, 0xe7)
    charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
 code point: 231
     syntax: whitespace
   category:
buffer code: 0xE7
  file code: 0xE7 (encoded by coding system utf-8)
       font: -monotype-courier 
new-medium-r-normal--13-94-99-99-m-80-adobe-standard
-- 
Alexander Kotelnikov
Saint-Petersburg, Russia





reply via email to

[Prev in Thread] Current Thread [Next in Thread]