groff-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[groff] 06/18: [libgroff, troff]: Refactor.


From: G. Branden Robinson
Subject: [groff] 06/18: [libgroff, troff]: Refactor.
Date: Fri, 5 Jan 2024 22:25:18 -0500 (EST)

gbranden pushed a commit to branch master
in repository groff.

commit 4c08cda8ae420dc8fe48932e81f0565768ec20d9
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Wed Jan 3 17:48:58 2024 -0600

    [libgroff, troff]: Refactor.
    
    * src/include/unicode.h: Rename function `check_unicode_name` to
      `valid_unicode_code_sequence` and update comments to better explain
      what it actually does.  The validity of "u1234_5678" in addition to
      "u1234" was undocumented and not even implied.
    * src/libs/libgroff/unicode.cpp (check_unicode_name): Rename this...
      (valid_unicode_code_sequence): ...to this.
    
    * src/libs/libgroff/font.cpp (glyph_to_unicode)
    * src/roff/troff/input.cpp (token::next, map_composite_character)
      (composite_glyph_name): Update call sites.  Make comparisons to null
      pointers explicit.
    
    Also fix code style nits.  Update editor aid comments; drop old-style
    Emacs file-local variable setting.  Drop comment that had no purpose
    other than to mark the end of the file since the editor aid comments
    also fulfill that role.  Wrap long lines.  Annotate null pointers with
    `nullptr` comment to ease any future transition to C++11, which defines
    it as a keyword.
---
 ChangeLog                     | 17 +++++++++++++++
 src/include/unicode.h         | 49 ++++++++++++++++++++++++++-----------------
 src/libs/libgroff/font.cpp    |  2 +-
 src/libs/libgroff/unicode.cpp | 23 ++++++++++++--------
 src/roff/troff/input.cpp      | 20 +++++++++---------
 5 files changed, 72 insertions(+), 39 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index f813fbd58..4fb73ad4a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,23 @@
 
        * src/utils/grog/grog.pl: Trivially refactor; simplify code.
 
+2024-01-03  G. Branden Robinson <g.branden.robinson@gmail.com>
+
+       [libgroff, troff]: Refactor.
+
+       * src/include/unicode.h: Rename function `check_unicode_name` to
+       `valid_unicode_code_sequence` and update comments to better
+       explain what it actually does.  The validity of "u1234_5678" in
+       addition to "u1234" was undocumented and not even implied.
+       * src/libs/libgroff/unicode.cpp (check_unicode_name): Rename
+       this...
+       (valid_unicode_code_sequence): ...to this.
+
+       * src/libs/libgroff/font.cpp (glyph_to_unicode)
+       * src/roff/troff/input.cpp (token::next)
+       (map_composite_character, composite_glyph_name): Update call
+       sites.  Make comparisons to null pointers explicit.
+
 2024-01-03  G. Branden Robinson <g.branden.robinson@gmail.com>
 
        [gropdf]: Add "notice" diagnostic level for debugging.
diff --git a/src/include/unicode.h b/src/include/unicode.h
index 670864b08..cea39c233 100644
--- a/src/include/unicode.h
+++ b/src/include/unicode.h
@@ -1,5 +1,4 @@
-// -*- C++ -*-
-/* Copyright (C) 2002-2020 Free Software Foundation, Inc.
+/* Copyright (C) 2002-2024 Free Software Foundation, Inc.
      Written by Werner Lemberg <wl@gnu.org>
 
 This file is part of groff.
@@ -17,8 +16,8 @@ for more details.
 You should have received a copy of the GNU General Public License
 along with this program.  If not, see <http://www.gnu.org/licenses/>. */
 
-// Convert a groff glyph name to a string containing an underscore-separated
-// list of Unicode code points.  For example,
+// Convert a groff glyph name to a C string containing an
+// underscore-separated list of Unicode code points.  For example,
 //
 //   '-'   ->  '2010'
 //   ',c'  ->  '00E7'
@@ -27,8 +26,8 @@ along with this program.  If not, see 
<http://www.gnu.org/licenses/>. */
 // Return NULL if there is no equivalent.
 const char *glyph_name_to_unicode(const char *);
 
-// Convert a string containing an underscore-separated list of Unicode code
-// points to a groff glyph name.  For example,
+// Convert a C string containing an underscore-separated list of Unicode
+// code points to a groff glyph name.  For example,
 //
 //   '2010'       ->  'hy'
 //   '0066_006C'  ->  'fl'
@@ -36,10 +35,10 @@ const char *glyph_name_to_unicode(const char *);
 // Return NULL if there is no equivalent.
 const char *unicode_to_glyph_name(const char *);
 
-// Convert a string containing a precomposed Unicode character to a string
-// containing an underscore-separated list of Unicode code points,
-// representing its canonical decomposition.  Also perform compatibility
-// equivalent replacement.  For example,
+// Convert a C string containing a precomposed Unicode character to a
+// string containing an underscore-separated list of Unicode code
+// points, representing its canonical decomposition.  Also perform
+// compatibility equivalent replacement.  For example,
 //
 //   '1F3A' -> '0399_0313_0300'
 //   'FA6A' -> '983B'
@@ -47,17 +46,29 @@ const char *unicode_to_glyph_name(const char *);
 // Return NULL if there is no equivalent.
 const char *decompose_unicode(const char *);
 
-// Test whether the given string denotes a Unicode character.  It must
-// be of the form 'uNNNN', obeying the following rules.
+// Validate the given C string as representing a Unicode grapheme
+// cluster to troff or an output driver.  The string must match the
+// extended regular expression 'u1*[0-9]{4,5}(_1*[0-9]{4,5})*' and obey
+// the following rules.
 //
-//   - 'NNNN' must consist of at least 4 hexadecimal digits in upper case.
-//   - If there are more than 4 hexadecimal digits, the leading one must not
-//     be zero,
+//   - 'NNNN' must consist of at least 4 hexadecimal digits in upper
+//     case.
+//   - If there are more than 4 hexadecimal digits, the leading one must
+//     not be zero.
 //   - 'NNNN' must denote a valid Unicode code point (U+0000..U+10FFFF,
 //     excluding surrogate code points.
+//   - The string may represent a sequence of Unicode code points
+//     separated by '_' characters.  Each must satisfy the criteria
+//     above.  It is up to the caller to ensure that the first is a base
+//     character and that subsequent ones are valid combining characters
+//     (in troff, these are set up with the `composite` request).
 //
-// Return a pointer to 'NNNN' (skipping the leading 'u' character) in case
-// of success, NULL otherwise.
-const char *check_unicode_name(const char *);
+// Return a pointer to the second character in the string (skipping the
+// leading 'u') if successful, and a null pointer otherwise.
+const char *valid_unicode_code_sequence(const char *);
 
-// end of unicode.h
+// Local Variables:
+// fill-column: 72
+// mode: C++
+// End:
+// vim: set cindent noexpandtab shiftwidth=2 textwidth=72:
diff --git a/src/libs/libgroff/font.cpp b/src/libs/libgroff/font.cpp
index ab5efb2e4..d19f3d0ca 100644
--- a/src/libs/libgroff/font.cpp
+++ b/src/libs/libgroff/font.cpp
@@ -183,7 +183,7 @@ int glyph_to_unicode(glyph *g)
       }
     }
     // Unicode character?
-    if (check_unicode_name(nm)) {
+    if (valid_unicode_code_sequence(nm)) {
       char *ignore;
       return (int)strtol(nm + 1, &ignore, 16);
     }
diff --git a/src/libs/libgroff/unicode.cpp b/src/libs/libgroff/unicode.cpp
index 29e80c72e..351e0294c 100644
--- a/src/libs/libgroff/unicode.cpp
+++ b/src/libs/libgroff/unicode.cpp
@@ -1,4 +1,3 @@
-// -*- C++ -*-
 /* Copyright (C) 2002-2020 Free Software Foundation, Inc.
      Written by Werner Lemberg <wl@gnu.org>
 
@@ -23,10 +22,10 @@ along with this program.  If not, see 
<http://www.gnu.org/licenses/>. */
 
 #include "unicode.h"
 
-const char *check_unicode_name(const char *u)
+const char *valid_unicode_code_sequence(const char *u)
 {
   if (*u != 'u')
-    return 0;
+    return 0 /* nullptr */;
   const char *p = ++u;
   for (;;) {
     int val = 0;
@@ -34,32 +33,38 @@ const char *check_unicode_name(const char *u)
     for (;;) {
       // only uppercase hex digits allowed
       if (!csxdigit(*p))
-       return 0;
+       return 0 /* nullptr */;
       if (csdigit(*p))
        val = val*0x10 + (*p-'0');
       else if (csupper(*p))
        val = val*0x10 + (*p-'A'+10);
       else
-       return 0;
+       return 0 /* nullptr */;
       // biggest Unicode value is U+10FFFF
       if (val > 0x10FFFF)
-       return 0;
+       return 0 /* nullptr */;
       p++;
       if (*p == '\0' || *p == '_')
        break;
     }
     // surrogates not allowed
     if ((val >= 0xD800 && val <= 0xDBFF) || (val >= 0xDC00 && val <= 0xDFFF))
-      return 0;
+      return 0 /* nullptr */;
     if (val > 0xFFFF) {
       if (*start == '0')       // no leading zeros allowed if > 0xFFFF
-       return 0;
+       return 0 /* nullptr */;
     }
     else if (p - start != 4)   // otherwise, check for exactly 4 hex digits
-      return 0;
+      return 0 /* nullptr */;
     if (*p == '\0')
       break;
     p++;
   }
   return u;
 }
+
+// Local Variables:
+// fill-column: 72
+// mode: C++
+// End:
+// vim: set cindent noexpandtab shiftwidth=2 textwidth=72:
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 214349bd9..54037e04f 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -2405,8 +2405,8 @@ void token::next()
            nm = composite_glyph_name(s);
          }
          else {
-           const char *gn = check_unicode_name(s.contents());
-           if (gn) {
+           const char *gn = valid_unicode_code_sequence(s.contents());
+           if (gn != 0 /* nullptr */) {
              const char *gn_decomposed = decompose_unicode(gn);
              if (gn_decomposed)
                gn = &gn_decomposed[1];
@@ -4122,8 +4122,8 @@ static void map_composite_character()
   }
   const char *from_gn = glyph_name_to_unicode(from.contents());
   if (!from_gn) {
-    from_gn = check_unicode_name(from.contents());
-    if (!from_gn) {
+    from_gn = valid_unicode_code_sequence(from.contents());
+    if (0 /* nullptr */ == from_gn) {
       error("invalid composite glyph name '%1'", from.contents());
       skip_line();
       return;
@@ -4142,8 +4142,8 @@ static void map_composite_character()
   }
   const char *to_gn = glyph_name_to_unicode(to.contents());
   if (!to_gn) {
-    to_gn = check_unicode_name(to.contents());
-    if (!to_gn) {
+    to_gn = valid_unicode_code_sequence(to.contents());
+    if (0 /* nullptr */ == to_gn) {
       error("invalid composite glyph name '%1'", to.contents());
       skip_line();
       return;
@@ -4166,8 +4166,8 @@ static symbol composite_glyph_name(symbol nm)
   input_stack::push(mi);
   const char *gn = glyph_name_to_unicode(nm.contents());
   if (!gn) {
-    gn = check_unicode_name(nm.contents());
-    if (!gn) {
+    gn = valid_unicode_code_sequence(nm.contents());
+    if (0 /* nullptr */ == gn) {
       error("invalid base glyph '%1' in composite glyph name", nm.contents());
       return EMPTY_SYMBOL;
     }
@@ -4187,8 +4187,8 @@ static symbol composite_glyph_name(symbol nm)
     gl += '\0';
     const char *u = glyph_name_to_unicode(gl.contents());
     if (!u) {
-      u = check_unicode_name(gl.contents());
-      if (!u) {
+      u = valid_unicode_code_sequence(gl.contents());
+      if (0 /* nullptr */ == u) {
        error("invalid component '%1' in composite glyph name",
              gl.contents());
        return EMPTY_SYMBOL;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]