emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character literals for Unicode (control) characters


From: Paul Eggert
Subject: Re: Character literals for Unicode (control) characters
Date: Mon, 14 Mar 2016 13:03:38 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

Thanks, here's a detailed low level review.

Subject: [PATCH 4/4] Use `ucs-names'.

Summary lines like "Use `ucs-names'." should not end with "." and should be as informative as possible within a 50-char limit.

+#include <stdnoreturn.h>

This include reportedly doesn't work well with Microsoft compilers. Omit it and use _Noreturn instead of noreturn.

+/* Signals an `invalid-read-syntax' error indicating that the
+   character name in an \N{...} literal is invalid.  */

Use active voice "Signal an" rather than a non-sentence. Don't use grave quoting in comments (no quoting needed here anyway).

+static noreturn void invalid_character_name (Lisp_Object name)

Put "static _Noreturn void" on the first line, and the rest on the next line; that's the usual GNU style.

+/* Checks that CODE is a valid Unicode scalar value, and returns its
+   value.  CODE should be parsed from the character name given by
+   NAME.  NAME is used for error messages.  */

Active voice: "Checks" -> "Check".

+static int check_scalar_value (Lisp_Object code, Lisp_Object name)

"static int" in a separate line.

+{
+  if (! RANGED_INTEGERP (0, code, MAX_UNICODE_CHAR) ||
+      /* Don't allow surrogates.  */
+      RANGED_INTEGERP (0xD800, code, 0xDFFF))
+    invalid_character_name (name);
+  return XINT (code);
+}

RANGED_INTEGERP implies two tests for integer. Better would be an explicit NUMBERP check, followed by an XINT, followed by C-language range checks. Just use <= or < in range checks (not >= or >).

Also, don't put operators like || at the end of a line; put them at the start of the next line instead.

+/* If NAME starts with PREFIX, interpret the rest as a hexadecimal
+   number and return its value.  Raises `invalid-read-syntax' if the
+   number is not a valid scalar value.  Returns -1 if NAME doesn't
+   start with PREFIX.  */

Active voice. No need for grave quoting.

+static int
+parse_code_after_prefix (Lisp_Object name, const char* prefix)

"char* x" -> "char *x" in GNU style.

+  if (name_len > prefix_len && name_len <= prefix_len + 8

Just use < or <= for range checks.

+ Lisp_Object code = string_to_number (SDATA (name) + prefix_len, 16, false);
+      if (! NILP (code))
+        return check_scalar_value (code, name);

Why is nil treated differently from other invalid values (e.g., floating-point numbers)? They're all invalid character names, right?


+      /* Various ranges of CJK characters; see UnicodeData.txt. */
+      if ((code >= 0x3400 && code <= 0x4DB5) ||
+          (code >= 0x4E00 && code <= 0x9FD5) ||
+          (code >= 0x20000 && code <= 0x2A6D6) ||
+          (code >= 0x2A700 && code <= 0x2B734) ||
+          (code >= 0x2B740 && code <= 0x2B81D) ||
+          (code >= 0x2B820 && code <= 0x2CEA1))
+        return code;

Use only <= here, and put || at the start of lines. What's the likelihood that the numbers in the above test will change?


+  if (! CONSP (names))
+    invalid_syntax ("Unicode character name database not loaded");

This test is not needed, as ucs-names always returns a cons, and anyway even if it didn't then Fassoc would do the right thing.

+        /* 200 characters is hopefully long enough.  Increase if
+           not.  */
+        char name[200];

Give a name to this constant, e.g.,

/* Bound on the length of a Unicode character name.
   As of Unicode 9.0.0 the maximum is 83, so this should be safe. */
enum { UNICODE_CHARACTER_NAME_LENGTH_BOUND = 199 };
...
   char name[UNICODE_CHARACTER_NAME_LENGTH_BOUND + 1];




reply via email to

[Prev in Thread] Current Thread [Next in Thread]