Re: [PATCH] Allow inserting non-BMP characters

From:

Philipp Stephani

Subject:

Date:

Tue, 26 Dec 2017 10:35:42 +0000

Eli Zaretskii <address@hidden> schrieb am Di., 26. Dez. 2017 um 05:46 Uhr:

> From: Philipp Stephani <address@hidden>
> Date: Mon, 25 Dec 2017 22:01:15 +0100
> Cc: Philipp Stephani <address@hidden>
>
> +/* Return the Unicode code point for the given UTF-16 surrogates. */
> +
> +INLINE int
> +surrogates_to_codepoint (int low, int high)
> +{
> + eassert (char_low_surrogate_p (low));
> + eassert (char_high_surrogate_p (high));
> + return 0x10000 + (low - 0xDC00) + ((high - 0xD800) * 0x400);
> +}
> +
> /* Data type for Unicode general category.

Suggest to move surrogates_to_codepoint to coding.c, and then use the
macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined
there.

Hmm, I'd rather go the other way round and remove these macros later. They are macros, thus worse than functions, and don't seem to be correct either (what about a value such as 0x11DC00?).

Also, a single-liner sounds like too little to justify a
function, so maybe make all of that macros in coding.h, and include
the latter in nsterm.m.

No new macros please if we can avoid it. Functions are strictly better.

I don't care much whether they are in character.h or coding.h, but char_surrogate_p is already in character.h.

> + USE_SAFE_ALLOCA;
> + unichar *utf16_buffer;
> + SAFE_NALLOCA (utf16_buffer, 1, len);

Maximum length of a UTF-16 sequence is known in advance, so why do you
need SAFE_NALLOCA here? Couldn't you use a buffer of fixed length
instead?

The text being inserted can be arbitrarily long. Even single characters (i.e. extended grapheme clusters) can be arbitrarily long.