bug#24603: [RFC 13/18] Add some tricky Unicode characters to regex test

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24603: [RFC 13/18] Add some tricky Unicode characters to regex test

From:	Michal Nazarewicz
Subject:	bug#24603: [RFC 13/18] Add some tricky Unicode characters to regex test
Date:	Tue, 4 Oct 2016 03:10:36 +0200

* test/src/regex-tests.el: Include capital ‘DZ’ dygraph, sharp ‘s’,
capital ligature ‘IJ’, small ligature ‘fi’, title-case dygraph ‘Dz’,
all three forms of Greek sigma and and IPA ɕ symbol in the regex tests.
---
 test/src/regex-tests.el | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el
index c4844c7..fa66ff1 100644
--- a/test/src/regex-tests.el
+++ b/test/src/regex-tests.el
@@ -65,27 +65,30 @@ regex--test-cc
         (skip-chars-forward (concat "[:" name ":]\u2622"))
         (should (or (equal (point) p) (equal (point) (1+ p))))))))
 
-(dolist (test '(("alnum" "abcABC012łąka" "-, \t\n")
-                ("alpha" "abcABCłąka" "-,012 \t\n")
+(dolist (test '(("alnum" "abcABC012łąkaǱßĲﬁǲΣσςɕ" "-, \t\n")
+                ("alpha" "abcABCłąkaǱßĲﬁǲΣσςɕ" "-,012 \t\n")
                 ("digit" "012" "abcABCłąka-, \t\n")
                 ("xdigit" "0123aBc" "łąk-, \t\n")
-                ("upper" "ABCŁĄKA" "abc012-, \t\n")
-                ("lower" "abcłąka" "ABC012-, \t\n")
+                ("upper" "ABCŁĄKAǱĲΣ" "abcß0ﬁσςɕ12-, \t\n")
+                ;; FIXME: ßﬁɕ are all lower case (even though they don’t have
+                ;; (single-character) upper-case form).
+                ("lower" "abcłąkaσς" "ABC012ǱĲΣ-, \t\n")
 
-                ("word" "abcABC012\u2620" "-, \t\n")
+                ("word" "abcABC012\u2620ǱßĲﬁǲΣσςɕ" "-, \t\n")
 
                 ("punct" ".,-" "abcABC012\u2620 \t\n")
                 ("cntrl" "\1\2\t\n" ".,-abcABC012\u2620 ")
-                ("graph" "abcłąka\u2620-," " \t\n\1")
-                ("print" "abcłąka\u2620-, " "\t\n\1")
+                ("graph" "abcłąka\u2620-,ǱßĲﬁǲΣσςɕ" " \t\n\1")
+                ("print" "abcłąka\u2620-,ǱßĲﬁǲΣσςɕ " "\t\n\1")
 
                 ("space" " \t\n\u2001" "abcABCł0123")
                 ("blank" " \t" "\n\u2001")
 
-                ("ascii" "abcABC012 \t\n\1" "łą\u2620")
-                ("nonascii" "łą\u2622" "abcABC012 \t\n\1")
-                ("unibyte" "abcABC012 \t\n\1" "łą\u2622")
-                ("multibyte" "łą\u2622" "abcABC012 \t\n\1")))
+                ("ascii" "abcABC012 \t\n\1" "łą\u2620ǱßĲﬁǲΣσςɕ")
+                ("nonascii" "łą\u2622ǱßĲﬁǲΣσςɕ" "abcABC012 \t\n\1")
+                ;; Note: sharp s is unibyte since it’s code point is below 256.
+                ("unibyte" "abcABC012ß \t\n\1" "łą\u2622ǱĲﬁǲΣσςɕ")
+                ("multibyte" "łą\u2622ǱĲﬁǲΣσςɕ" "abcABC012ß \t\n\1")))
   (let ((name (intern (concat "regex-tests-" (car test) "-character-class")))
         (doc (concat "Perform sanity test of regexes using " (car test)
                      " character class.
-- 
2.8.0.rc3.226.g39d4020

[Prev in Thread]

Current Thread

[Next in Thread]

bug#24603: [RFC 00/18] Improvement to casing, Michal Nazarewicz, 2016/10/03
- bug#24603: [RFC 01/18] Add tests for casefiddle.c, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 05/18] Introduce case_character function, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 06/18] Add support for title-casing letters, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 13/18] Add some tricky Unicode characters to regex test, Michal Nazarewicz <=
  - bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties, Michal Nazarewicz, 2016/10/03
    - bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties, Eli Zaretskii, 2016/10/04
  - bug#24603: [RFC 04/18] Split casify_object into multiple functions, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 03/18] Don’t assume character can be either upper- or lower-case when casing, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 12/18] Implement rules for title-casing Dutch ij ‘letter’, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 11/18] Implement casing rules for Lithuanian, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case, Michal Nazarewicz, 2016/10/03
    - bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case, Eli Zaretskii, 2016/10/04
    - bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case, Michal Nazarewicz, 2016/10/17
  - bug#24603: [RFC 09/18] Implement special sigma casing rule, Michal Nazarewicz, 2016/10/03

Prev by Date: bug#24603: [RFC 06/18] Add support for title-casing letters
Next by Date: bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties
Previous by thread: bug#24603: [RFC 06/18] Add support for title-casing letters
Next by thread: bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties
Index(es):
- Date
- Thread