bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24425: [PATCH] Don’t cast Unicode to 8-bit when casing unibyte strin


From: Michal Nazarewicz
Subject: bug#24425: [PATCH] Don’t cast Unicode to 8-bit when casing unibyte strings
Date: Tue, 13 Sep 2016 00:46:07 +0200

Currently, when operating on unibyte strings and buffers, if casing
ASCII character results in a Unicode character the result is forcefully
converted to 8-bit by masking all but the eight least significant bits.
This has awkward results such as:

        (let ((table (make-char-table 'case-table)))
          (set-char-table-parent table (current-case-table))
          (set-case-syntax-pair ?I ?ı table)
          (set-case-syntax-pair ?İ ?i table)
          (with-case-table table
            (concat (upcase "istanabul") " " (downcase "IRMA"))))
        => "0STANABUL 1rma"

Change the code so that ASCII characters being cased to Unicode
characters are left unchanged when operating on unibyte data.  In other
words, aforementioned example will produce:

        => "iSTANBUL "Irma"

Arguably this isn’t correct either but it’s less wrong and ther’s not
much we can do when the strings are unibyte.

Note that casify_object had a ‘(c >= 0 && c < 256)’ condition but since
CHAR_TO_BYTE8 (and thus MAKE_CHAR_UNIBYTE) happily casts Unicode
characters to 8-bit (i.e. c & 0xFF), this never triggered for discussed
case.

* src/casefiddle.c (casify_object, casify_region): When dealing with
unibyte data, don’t attempt to store Unicode characters in the result.
---
 src/casefiddle.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

 Unless there are objections, I’ll commit it in a few days.

diff --git a/src/casefiddle.c b/src/casefiddle.c
index 2d32f49..247cc6f 100644
--- a/src/casefiddle.c
+++ b/src/casefiddle.c
@@ -71,8 +71,8 @@ casify_object (enum case_action flag, Lisp_Object obj)
        {
          if (! inword)
            c = upcase1 (c1);
-         if (! multibyte)
-           MAKE_CHAR_UNIBYTE (c);
+         if (! multibyte && CHAR_BYTE8_P (c))
+           c = CHAR_TO_BYTE8 (c);
          XSETFASTINT (obj, c | flags);
        }
       return obj;
@@ -93,18 +93,19 @@ casify_object (enum case_action flag, Lisp_Object obj)
          c1 = c;
          if (inword && flag != CASE_CAPITALIZE_UP)
            c = downcase (c);
-         else if (!uppercasep (c)
-                  && (!inword || flag != CASE_CAPITALIZE_UP))
-           c = upcase1 (c1);
+         else if (!inword || flag != CASE_CAPITALIZE_UP)
+           c = upcase (c1);
          if ((int) flag >= (int) CASE_CAPITALIZE)
            inword = (SYNTAX (c) == Sword);
          if (c != c1)
            {
-             MAKE_CHAR_UNIBYTE (c);
-             /* If the char can't be converted to a valid byte, just don't
-                change it.  */
-             if (c >= 0 && c < 256)
-               SSET (obj, i, c);
+             if (CHAR_BYTE8_P (c))
+               c = CHAR_TO_BYTE8 (c);
+             else if (!ASCII_CHAR_P (c))
+               /* If the char can't be converted to a valid byte, just don't
+                  change it.  */
+               continue;
+             SSET (obj, i, c);
            }
        }
       return obj;
@@ -250,8 +251,11 @@ casify_region (enum case_action flag, Lisp_Object b, 
Lisp_Object e)
 
          if (! multibyte)
            {
-             MAKE_CHAR_UNIBYTE (c);
-             FETCH_BYTE (start_byte) = c;
+             /* If the char can't be converted to a valid byte, just don't
+                change it.  */
+             if (ASCII_CHAR_P (c) ||
+                 (CHAR_BYTE8_P (c) && ((c = CHAR_TO_BYTE8 (c)), true)))
+               FETCH_BYTE (start_byte) = c;
            }
          else if (ASCII_CHAR_P (c2) && ASCII_CHAR_P (c))
            FETCH_BYTE (start_byte) = c;
-- 
2.8.0.rc3.226.g39d4020






reply via email to

[Prev in Thread] Current Thread [Next in Thread]