Re: [PATCH] Allow inserting non-BMP characters

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Allow inserting non-BMP characters

From:	Eli Zaretskii
Subject:	Re: [PATCH] Allow inserting non-BMP characters
Date:	Tue, 26 Dec 2017 18:11:18 +0200

> From: Philipp Stephani <address@hidden>
> Date: Tue, 26 Dec 2017 10:35:42 +0000
> Cc: address@hidden, address@hidden
> 
>  Suggest to move surrogates_to_codepoint to coding.c, and then use the
>  macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined
>  there.
> 
> Hmm, I'd rather go the other way round and remove these macros later. They 
> are macros, thus worse than
> functions,

I don't think we have a policy to prefer inline functions to macros,
and I don't think we should have such a policy.  We use inline
functions when that's necessary, but we don't in general prefer them.
They have their own problems, see the comments in lisp.h for some of
that.

> and don't seem to be correct either (what about a value such as 0x11DC00?).

??? They care correct for UTF-16 sequences, which are 16-bit numbers.
If you need to augment them by testing the high-order bits to be zero
in your case, that's okay, but I don't see any need for introducing
similar but different functionality.

> No new macros please if we can avoid it. Functions are strictly better.

Sorry, I disagree.  Each has its advantages, and on balance I find
macros to be slightly better, certainly not worse.  There's no need to
avoid them in C.

> I don't care much whether they are in character.h or coding.h, but 
> char_surrogate_p is already in character.h.

char_surrogate_p should have used the coding.h macros as well.

>  > +  USE_SAFE_ALLOCA;
>  > +  unichar *utf16_buffer;
>  > +  SAFE_NALLOCA (utf16_buffer, 1, len);
> 
>  Maximum length of a UTF-16 sequence is known in advance, so why do you
>  need SAFE_NALLOCA here?  Couldn't you use a buffer of fixed length
>  instead?
> 
> The text being inserted can be arbitrarily long. Even single characters (i.e. 
> extended grapheme clusters) can
> be arbitrarily long.

Yes, but why do you first copy the input into a separate buffer?  Why
not convert each UTF-16 sequence separately, as you go through the
loop?

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] Allow inserting non-BMP characters, Philipp Stephani, 2017/12/25
- Re: [PATCH] Allow inserting non-BMP characters, Alan Third, 2017/12/25
- Re: [PATCH] Allow inserting non-BMP characters, Eli Zaretskii, 2017/12/25
  - Re: [PATCH] Allow inserting non-BMP characters, Philipp Stephani, 2017/12/26
    - Re: [PATCH] Allow inserting non-BMP characters, Eli Zaretskii <=
    - Re: [PATCH] Allow inserting non-BMP characters, Philipp Stephani, 2017/12/26
    - Re: [PATCH] Allow inserting non-BMP characters, Eli Zaretskii, 2017/12/26
    - Re: [PATCH] Allow inserting non-BMP characters, Alan Third, 2017/12/26
    - Re: [PATCH] Allow inserting non-BMP characters, Eli Zaretskii, 2017/12/26
    - Re: [PATCH] Allow inserting non-BMP characters, Alan Third, 2017/12/28
    - Re: [PATCH] Allow inserting non-BMP characters, Philipp Stephani, 2017/12/28
    - Re: [PATCH] Allow inserting non-BMP characters, Eli Zaretskii, 2017/12/28
    - Re: [PATCH] Allow inserting non-BMP characters, Philipp Stephani, 2017/12/29
    - Re: [PATCH] Allow inserting non-BMP characters, Eli Zaretskii, 2017/12/29

Prev by Date: Re: [PATCH] Allow inserting non-BMP characters
Next by Date: Re: [PATCH] Use get_key_arg to parse keyword arguments in json_parse_object_type
Previous by thread: Re: [PATCH] Allow inserting non-BMP characters
Next by thread: Re: [PATCH] Allow inserting non-BMP characters
Index(es):
- Date
- Thread