Re: Character literals for Unicode (control) characters

From:

Philipp Stephani

Subject:

Date:

Sun, 06 Mar 2016 18:28:09 +0000

Paul Eggert <address@hidden> schrieb am So., 6. März 2016 um 19:08 Uhr:

Thanks for taking this on. Some comments:

Why the hash table? Existing Lisp code dealing with Unicode names uses an alist,
and it seems to do OK.

Hash tables are as easy to use as alists, but have average O(1) lookup time, as opposed to O(n) time for alists. Also alists are more prone to cache invalidation because they are less contiguous.

If a hash table is needed, a hash table should also be
used by the existing code elsewhere that does something similar. See the
function ucs-names and its callers.

Initially I used ucs-names, but the decided against it because it lacks most characters. That's OK for a tables used for completion, but for inputting all characters should be present. So the use cases are different.

If a hash table is needed, I suggest using a perfect hashing function (generated
by gperf) and checking its results with get-char-code-property. That avoids the
runtime overhead of initialization.

Sounds good, but that would require much more effort and would delay this project unnecessarily. It can be done later once the basic functionality is in place.

It needs documentation, both in the Emacs Lisp manual and in NEWS.

Yes, I've attached a patch.

> +void init_character_names ()
> +{

The usual style is:

void
init_character_names (void)
{

No need for "const" for local variables (cost exceeds benefit).

Removed.

> if (c_isspace (c))
> {
> if (! whitespace)
> {
> whitespace = true;
> name[length++] = ' ';
> }
> }
> else
> {
> whitespace = false;
> name[length++] = c;
> }

This would be a bit easier to follow (and most likely a tiny bit more efficient)
as something like this:

> bool ws = c_isspace (c);
> if (ws)
> {
> length -= whitespace;
> c = ' ';
> }
> whitespace = ws;
> name[length++] = c;

I'd rather not have length decrease. Moved out the assignment, though.