[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: TUTORIAL.bg and windows-1251
From: |
Kenichi Handa |
Subject: |
Re: TUTORIAL.bg and windows-1251 |
Date: |
Tue, 25 Nov 2003 08:55:52 +0900 (JST) |
Sorry for the late responses on this thread. I'm now
involved in threads more than what my capacity allows.
In article <address@hidden>, Ognyan Kulev <address@hidden> writes:
> Kenichi Handa wrote:
>> I think the default handling of cyrillic characters must be
>> most convenient for native users. But, there are many
>> languages that use cyrillic and their requests may conflict.
>> So I think we must start from adjusting each language
>> environment. Once we found most language environments
>> require the same setting, we can make it the default.
> Can X encoding be adjusted? Isn't there only two choices for cyrillic:
> iso10646-1 and iso8859-5?
It seems that bg_BG locale of glibc, gtk, or XFree86 (I
don't know which is responsible for) encodes cyrillic
characters using extended segment with charset name
"microsoft-cp1251" in selection.
Please try the attached file. It overrides the ctext
encoder/decoder so that microsoft-cp1251 is used on decoding
in Bulgarian lang. env.
[...]
> The negative site of Debian packages is that each encoding of the four
> above mentioned has its own package. So people sometimes install only
> microsoft-cp1251 and iso10646-1 fonts, without koi8-r and iso8859-5 ones.
> Another problem with cronyx-courier is that it doesn't work when it's
> set in Default in Basic Faces customize group. I've just posted
> question to comp.emacs.
> What about the following: when mule-unicode-0100-24ff is used and the
> used iso10646-1 font doesn't contain wanted character (e.g. cyrillic
> one), then another font is searched that contains such character. I
> think this will often end up in cronyx-courier. Is this hard to be
> implemented?
I've implemented it in emacs-unicode verion. But, that
change requires various infrastructure of emacs-unicode, so
it's very difficult to back port it in HEAD.
Anyway, the attached ctext.el also contains a short code to
enable Emacs to display characters in windows-1251 by
microsoft-cp1251 font. Please try to call
(use-microsoft-cp1251-font).
---
Ken'ichi HANDA
address@hidden
--- ctext.el ---
(defvar ctext-non-standard-encodings-database
'(("big5-0" big5 2 (chinese-big5-1 chinese-big5-2)))
"Alist of non-standard character set encodings for CTEXT's extended segments.
Each element has the form (ENCODING-NAME CODING-SYSTEM N-OCTET CHARSET)
and provides information about how to use \"extended segments\"
with the encoding name ENCODING-NAME.
CODING-SYSTEM is the coding-system to encode the characters into
an extended segment.
N-OCTET is the number of octets (bytes) that encodes a character
in the segment. It can be 0 (meaning the number of octets per
character is variable), 1, 2, 3, or 4.
CHARSET is a charater set containing characters that are encoded
as ENCODING-NAME. It may be a list of character sets. It may
also be a char-table, in which case characters that have non-nil
value in the char-table are the target.
On decoding CTEXT, all encoding names listed here are recognized.
On encoding CTEXT, encoding names in the variable
`ctext-non-standard-encodings-list' and in
`ctext-non-standard-encodings' property of the current language
environment are used.")
(defun ctext-post-read-conversion (len)
"Decode LEN characters encoded as Compound Text with Extended Segments."
(save-match-data
(save-restriction
(let ((case-fold-search nil)
(in-workbuf (string= (buffer-name) " *code-converting-work*"))
last-coding-system-used
pos bytes)
(or in-workbuf
(narrow-to-region (point) (+ (point) len)))
(decode-coding-region (point-min) (point-max) 'ctext)
(if in-workbuf
(set-buffer-multibyte t))
(while (re-search-forward ctext-non-standard-encodings-regexp
nil 'move)
(setq pos (match-beginning 0))
(if (match-beginning 1)
;; ESC % / [0-4] M L --ENCODING-NAME-- \002 --BYTES--
(let* ((M (char-after (+ pos 4)))
(L (char-after (+ pos 5)))
(encoding (match-string 2))
(encoding-info (assoc-ignore-case
encoding
ctext-non-standard-encodings-database))
(coding (if encoding-info
(nth 1 encoding-info)
(setq encoding (intern (downcase encoding)))
(and (coding-system-p encoding)
encoding))))
(setq bytes (- (+ (* (- M 128) 128) (- L 128))
(- (point) (+ pos 6))))
(when coding
(delete-region pos (point))
(forward-char bytes)
(decode-coding-region (- (point) bytes) (point) coding)))
;; ESC % G --UTF-8-BYTES-- ESC % @
(setq bytes (- (point) pos))
(decode-coding-region (- (point) bytes) (point) 'utf-8))))
(goto-char (point-min))
(- (point-max) (point)))))
(defvar ctext-non-standard-encodings-list
'("big5-0")
"List of non-standard character set encoding names used in CTEXT.")
(defun ctext-non-standard-encodings-table ()
(let ((table (make-char-table 'translation-table)))
(dolist (encoding (reverse
(append
(get-language-info current-language-environment
'ctext-non-standard-encodings)
ctext-non-standard-encodings-list)))
(let* ((slot (assoc encoding ctext-non-standard-encodings-database))
(charset (nth 3 slot)))
(if charset
(cond ((charsetp charset)
(aset table (make-char charset) slot))
((listp charset)
(dolist (elt charset)
(aset table (make-char elt) slot)))
((char-table-p charset)
(map-char-table #'(lambda (k v)
(if (and v (> k 128)) (aset table k slot)))
charset))))))
table))
(defun ctext-pre-write-conversion (from to)
"Encode characters between FROM and TO as Compound Text w/Extended Segments.
If FROM is a string, or if the current buffer is not the one set up for us
by encode-coding-string, generate a new temp buffer, insert the
text, and convert it in the temporary buffer. Otherwise, convert in-place."
(save-match-data
;; Setup a working buffer if necessary.
(cond ((stringp from)
(let ((buf (current-buffer)))
(set-buffer (generate-new-buffer " *temp"))
(set-buffer-multibyte (multibyte-string-p from))
(insert from)))
((not (string= (buffer-name) " *code-converting-work*"))
(let ((buf (current-buffer))
(multibyte enable-multibyte-characters))
(set-buffer (generate-new-buffer " *temp"))
(set-buffer-multibyte multibyte)
(insert-buffer-substring buf from to))))
;; Now we can encode the whole buffer.
(let ((encoding-table (ctext-non-standard-encodings-table))
last-coding-system-used
last-pos last-encoding-info
pos encoding-info end-pos)
(goto-char (setq last-pos (point-min)))
(setq end-pos (point-marker))
(while (re-search-forward "[^\000-\177]+" nil t)
(setq last-pos (match-beginning 0)
last-encoding-info (aref encoding-table (char-after last-pos)))
(set-marker end-pos (match-end 0))
(goto-char (1+ last-pos))
(catch 'tag
(while t
(setq encoding-info
(if (< (point) end-pos)
(aref encoding-table (following-char))))
(unless (eq last-encoding-info encoding-info)
(if last-encoding-info
(let ((encoding-name (car last-encoding-info))
(coding-system (nth 1 last-encoding-info))
(noctets (nth 2 last-encoding-info))
len)
(encode-coding-region last-pos (point) coding-system)
(setq len (+ (length encoding-name) 1
(- (point) last-pos)))
(save-excursion
(goto-char last-pos)
(insert (string-to-multibyte
(format "\e%%/%d%c%c%s"
noctets
(+ (/ len 128) 128)
(+ (% len 128) 128)
encoding-name)))))
(encode-coding-region last-pos (point) 'ctext-no-compositions))
(setq last-pos (point)
last-encoding-info encoding-info))
(if (< (point) end-pos)
(forward-char 1)
(throw 'tag nil))))
(if (< last-pos (point))
(encode-coding-region last-pos (point) 'ctext-no-compositions)))
(set-marker end-pos nil)
(goto-char (point-min))))
;; Must return nil, as build_annotations_2 expects that.
nil)
;; The followings are to override the current settings.
(set-language-info "Bulgarian" 'ctext-non-standard-encodings
'("microsoft-cp1251"))
(let ((elt `("microsoft-cp1251" windows-1251 1
,(get 'encode-windows-1251 'translation-table)))
(slot (assoc "microsoft-cp1251" ctext-non-standard-encodings-database)))
(if slot
(setcdr slot (cdr elt))
(push elt ctext-non-standard-encodings-database)))
(define-ccl-program ccl-encode-windows-1251-font
'(0
((r1 <<= 7)
(r1 += r2)
(translate-character encode-windows-1251 r0 r1)
)))
(let ((slot (assoc "microsoft-cp1251" font-ccl-encoder-alist)))
(if slot
(setcdr slot ccl-encode-windows-1251-font)
(push '("microsoft-cp1251" . ccl-encode-windows-1251-font)
font-ccl-encoder-alist)))
(defun use-microsoft-cp1251-font ()
(let ((fontspec '(nil . "microsoft-cp1251")))
(map-char-table
#'(lambda (k v)
(if (and v (> k 128))
(set-fontset-font "fontset-default" k fontspec)))
(get 'encode-windows-1251 'translation-table))))
- TUTORIAL.bg and windows-1251, Ognyan Kulev, 2003/11/14
- Re: TUTORIAL.bg and windows-1251, Ognyan Kulev, 2003/11/15
- Re: TUTORIAL.bg and windows-1251, Jason Rumney, 2003/11/15
- Re: TUTORIAL.bg and windows-1251, Kenichi Handa, 2003/11/17
- Re: TUTORIAL.bg and windows-1251, Ognyan Kulev, 2003/11/18
- Re: TUTORIAL.bg and windows-1251,
Kenichi Handa <=
- Re: TUTORIAL.bg and windows-1251, Ognyan Kulev, 2003/11/26
- Re: TUTORIAL.bg and windows-1251, Kenichi Handa, 2003/11/26
- Re: TUTORIAL.bg and windows-1251, Ognyan Kulev, 2003/11/26
- Re: TUTORIAL.bg and windows-1251, Kenichi Handa, 2003/11/26
- Re: TUTORIAL.bg and windows-1251, Ognyan Kulev, 2003/11/26