bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24784: 26.0.50; JSON strings with utf-16 escape codes


From: Helmut Eller
Subject: bug#24784: 26.0.50; JSON strings with utf-16 escape codes
Date: Wed, 26 Oct 2016 18:39:57 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)

On Tue, Oct 25 2016, Dmitry Gutov wrote:

> On 24.10.2016 22:57, Philipp Stephani wrote:
>
>> +(defsubst json--decode-utf-16-surrogates (high low)
>
> IIRC, there might be no actual benefit from making it a defsubst. If
> someone could benchmark it, I'd like to see the result.

I guess it doesn't hurt but I also doubt that it makes a measurable
difference as utf-16 surrogates are rarely needed.

>
>> +     ;; Special-case UTF-16 surrogate pairs,
>> +     ;; cf. https://tools.ietf.org/html/rfc7159#section-7
>> +     ((looking-at
>> +       (rx (group (any "Dd") (any "89ABab") (= 2 (any "0-9A-Fa-f")))
>> +           "\\u" (group (any "Dd") (any "C-Fc-f") (= 2 (any "0-9A-Fa-f")))))
>> +      (json-advance 10)
>> +      (json--decode-utf-16-surrogates
>> +       (string-to-number (match-string 1) 16)
>> +       (string-to-number (match-string 2) 16)))
>
> Shouldn't this go below the UTF-8 case, as the less-frequent one?

There's also an opportunity to detect unpaired surrogates, e.g.:

(defun json-read-escaped-char ()
  "Read the JSON string escaped character at point."
  ;; Skip over the '\'
  (json-advance)
  (let* ((char (json-pop))
         (special (assq char json-special-chars)))
    (cond
     (special (cdr special))
     ((not (eq char ?u)) char)
     ((looking-at "[0-9A-Fa-f]\\{4\\}")
      (let* ((code (string-to-number (match-string 0) 16)))
        (json-advance 4)
        (cond ((<= #xD800 code #xDBFF) ; UTF-16 high surrogate
               (cond ((looking-at "\\\\u\\([Dd][C-Fc-f][0-9A-Fa-f]\\{2\\}\\)")
                      (let ((low (string-to-number (match-string 1) 16)))
                        (json-advance 6)
                        (json--decode-utf-16-surrogates code low)))
                     (t
                      ;; Expected low surrogate missing
                      (signal 'json-string-escape (list (point))))))
              ((<= #xDC00 code #xDFFF)
               ;; Unexpected low surrogate
               (signal 'json-string-escape (list (point))))
              (t
               code))))
     (t
      (signal 'json-string-escape (list (point)))))))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]