bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23814: 24.5; bug of hz coding-system


From: ynyaaa
Subject: bug#23814: 24.5; bug of hz coding-system
Date: Fri, 29 Jul 2016 10:05:14 +0900

handa <handa@gnu.org> writes:

> In article <87twffigzv.fsf@gmail.com>, ynyaaa@gmail.com writes:
>
>> But I found other bugs about decodings of "~" escape.
>> "~~" and "~{!!~}" should be encoded and decoded as below.
>>     "~~" -> "~~~~" -> "~~"
>>     "~{!!~}" -> "~~{!!~~}" -> "~{!!~}"
>
>> In really they are encoded properly, but decoded in wrong way.
>>     (decode-coding-string (encode-coding-string "~~" 'hz) 'hz)
>>>> "~"
>>     (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz)
>>>> #("\x3000" 0 1 (charset chinese-gb2312))
>
> Thank you for finding those bugs.  Could you please try the attached
> patch instead?
>
> ---
> K. Handa
> handa@gnu.org

If there are unencodable characters, encodable characters may be broken.
In this example, the second ?\x4E00 character disappears.
    (set-language-environment 'Chinese-GB)
    (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz)
    => "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273"

To avoid this behavior, there are some solutions.
(a) While decoding, replace "~{...~}" with "\e$A...\e(B"
    and decode with iso-2022-7bit.
(b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding
    and insert "\e$)A" at the beginning of the temp buffer
    and decode with iso-2022-8bit-ss2.
    (8bit data are decoded as euc-cn.)
(c) While encoding, use euc-cn instead of iso-2022-7bit
    and translate each consecutive 8bit data to 7bit data
    prefixed by "~{" and postfixed by "~}".


By the way, RFC1843 describes:
    The escape sequence '~\n' is a line-continuation marker to be
    consumed with no output produced.

This form shoud return "AB".
    (decode-coding-string "A~\nB" 'hz)
    => "A\nB"

> diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
> index e531640..9abdae1 100644
> --- a/lisp/language/china-util.el
> +++ b/lisp/language/china-util.el
> @@ -95,7 +95,12 @@ decode-hz-region
>       (goto-char (point-min))
>       (while (search-forward "~" nil t)
>         (setq ch (following-char))
> -       (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
> +          (if (= ch ?{)
> +              (search-forward "~}" nil 'move)
> +            (when (or (= ch ?\n) (= ch ?~))
> +              (delete-char -1)
> +              (put-text-property (point) (1+ (point)) 'hz-decoded t)
> +              (forward-char 1))))
>  
>       ;; "^zW...\n" -> Chinese GB2312
>       ;; "~{...~}"  -> Chinese GB2312
> @@ -104,6 +109,8 @@ decode-hz-region
>       (while (re-search-forward hz/zw-start-gb nil t)
>         (setq pos (match-beginning 0)
>               ch (char-after pos))
> +          (if (and (= ch ?~) (get-text-property pos 'hz-decoded))
> +              (forward-char 1)
>         ;; Record the first position to start conversion.
>         (or beg (setq beg pos))
>         (end-of-line)
> @@ -122,9 +129,10 @@ decode-hz-region
>                                 t)
>                 (delete-char -2))
>             (setq end (point))
> -           (translate-region pos (point) hz-set-msb-table))))
> +           (translate-region pos (point) hz-set-msb-table)))))
>       (if beg
>           (decode-coding-region beg end 'euc-china)))
> +      (remove-text-properties (point-min) (point-max) '(hz-decoded nil))
>        (- (point-max) (point-min)))))
>  
>  ;;;###autoload
> @@ -142,6 +150,7 @@ encode-hz-region
>      (save-restriction
>        (narrow-to-region beg end)
>  
> +      (put-text-property beg end 'charset 'chinese-gb2312)
>        ;; "~" -> "~~"
>        (goto-char (point-min))
>        (while (search-forward "~" nil t)      (insert ?~))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]