emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Allow inserting non-BMP characters


From: Eli Zaretskii
Subject: Re: [PATCH] Allow inserting non-BMP characters
Date: Tue, 26 Dec 2017 18:11:18 +0200

> From: Philipp Stephani <address@hidden>
> Date: Tue, 26 Dec 2017 10:35:42 +0000
> Cc: address@hidden, address@hidden
> 
>  Suggest to move surrogates_to_codepoint to coding.c, and then use the
>  macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined
>  there.
> 
> Hmm, I'd rather go the other way round and remove these macros later. They 
> are macros, thus worse than
> functions,

I don't think we have a policy to prefer inline functions to macros,
and I don't think we should have such a policy.  We use inline
functions when that's necessary, but we don't in general prefer them.
They have their own problems, see the comments in lisp.h for some of
that.

> and don't seem to be correct either (what about a value such as 0x11DC00?).

??? They care correct for UTF-16 sequences, which are 16-bit numbers.
If you need to augment them by testing the high-order bits to be zero
in your case, that's okay, but I don't see any need for introducing
similar but different functionality.

> No new macros please if we can avoid it. Functions are strictly better.

Sorry, I disagree.  Each has its advantages, and on balance I find
macros to be slightly better, certainly not worse.  There's no need to
avoid them in C.

> I don't care much whether they are in character.h or coding.h, but 
> char_surrogate_p is already in character.h.

char_surrogate_p should have used the coding.h macros as well.

>  > +  USE_SAFE_ALLOCA;
>  > +  unichar *utf16_buffer;
>  > +  SAFE_NALLOCA (utf16_buffer, 1, len);
> 
>  Maximum length of a UTF-16 sequence is known in advance, so why do you
>  need SAFE_NALLOCA here?  Couldn't you use a buffer of fixed length
>  instead?
> 
> The text being inserted can be arbitrarily long. Even single characters (i.e. 
> extended grapheme clusters) can
> be arbitrarily long.

Yes, but why do you first copy the input into a separate buffer?  Why
not convert each UTF-16 sequence separately, as you go through the
loop?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]