emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: commit-msg hook


From: Eli Zaretskii
Subject: Re: commit-msg hook
Date: Tue, 14 Apr 2015 18:08:48 +0300

> Date: Mon, 13 Apr 2015 14:19:51 -0700
> From: Paul Eggert <address@hidden>
> CC: address@hidden
> 
> On 04/13/2015 01:18 PM, Eli Zaretskii wrote:
> > Gawk has the --characters-as-bytes option since v4.0.0, which should
> > countermand that, I think.
> 
> Sure, although the code should work even plain POSIX awk, as there 
> should be no need to assume such a GNU extension when bootstrapping.  
> That is, the script could support either:
> 
> 1. POSIX awk with multibyte OS support, with proper UTF-8 checking from 
> OS libraries; or
> 
> 2. GNU awk 4 (2012) or later, with nearly-as-good UTF-8 checking 
> hand-coded into the script; or
> 
> 3. Traditional awk without UTF-8 checking.
> 
> Currently the script supports (1) and (3) but someone could add support 
> for (2).

How about the following change?  It improves on (3), and worked for me
both on MS-Windows and on GNU/Linux.

--- ./.git/hooks/commit-msg.~5~ 2015-04-12 19:11:27.481125000 +0300
+++ ./.git/hooks/commit-msg     2015-04-14 11:11:02.000000000 +0300
@@ -45,10 +45,13 @@
   BEGIN {
     # These regular expressions assume traditional Unix unibyte behavior.
     # They are needed for old or broken versions of awk, e.g.,
-    # mawk 1.3.3 (1996), or gawk on MSYS (2015).
+    # mawk 1.3.3 (1996), or gawk on MSYS (2015), and/or for systems that
+    # cannot use UTF-8 as the codeset for the locale.
     space = "[ \f\n\r\t\v]"
     non_space = "[^ \f\n\r\t\v]"
-    non_print = "[\1-\37\177]"
+    # The non_print below rejects control characters and surrogates
+    # UTF-8 for: 0x01-0x1f 0x7f   0x80-0x9f    0xd800-0xdbff     0xdc00-0xdfff
+    non_print = 
"[\1-\37\177]|\302[\200-\237]|\355([\240-\257]|[\260-\277])[\200-\277]"
 
     # Prefer POSIX regular expressions if available, as they do a
     # better job of checking.  Similarly, prefer POSIX negated



reply via email to

[Prev in Thread] Current Thread [Next in Thread]