[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24784: 26.0.50; JSON strings with utf-16 escape codes
From: |
Helmut Eller |
Subject: |
bug#24784: 26.0.50; JSON strings with utf-16 escape codes |
Date: |
Wed, 26 Oct 2016 18:39:57 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) |
On Tue, Oct 25 2016, Dmitry Gutov wrote:
> On 24.10.2016 22:57, Philipp Stephani wrote:
>
>> +(defsubst json--decode-utf-16-surrogates (high low)
>
> IIRC, there might be no actual benefit from making it a defsubst. If
> someone could benchmark it, I'd like to see the result.
I guess it doesn't hurt but I also doubt that it makes a measurable
difference as utf-16 surrogates are rarely needed.
>
>> + ;; Special-case UTF-16 surrogate pairs,
>> + ;; cf. https://tools.ietf.org/html/rfc7159#section-7
>> + ((looking-at
>> + (rx (group (any "Dd") (any "89ABab") (= 2 (any "0-9A-Fa-f")))
>> + "\\u" (group (any "Dd") (any "C-Fc-f") (= 2 (any "0-9A-Fa-f")))))
>> + (json-advance 10)
>> + (json--decode-utf-16-surrogates
>> + (string-to-number (match-string 1) 16)
>> + (string-to-number (match-string 2) 16)))
>
> Shouldn't this go below the UTF-8 case, as the less-frequent one?
There's also an opportunity to detect unpaired surrogates, e.g.:
(defun json-read-escaped-char ()
"Read the JSON string escaped character at point."
;; Skip over the '\'
(json-advance)
(let* ((char (json-pop))
(special (assq char json-special-chars)))
(cond
(special (cdr special))
((not (eq char ?u)) char)
((looking-at "[0-9A-Fa-f]\\{4\\}")
(let* ((code (string-to-number (match-string 0) 16)))
(json-advance 4)
(cond ((<= #xD800 code #xDBFF) ; UTF-16 high surrogate
(cond ((looking-at "\\\\u\\([Dd][C-Fc-f][0-9A-Fa-f]\\{2\\}\\)")
(let ((low (string-to-number (match-string 1) 16)))
(json-advance 6)
(json--decode-utf-16-surrogates code low)))
(t
;; Expected low surrogate missing
(signal 'json-string-escape (list (point))))))
((<= #xDC00 code #xDFFF)
;; Unexpected low surrogate
(signal 'json-string-escape (list (point))))
(t
code))))
(t
(signal 'json-string-escape (list (point)))))))