emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Embedded modifiers in the regex engine


From: Dima Kogan
Subject: Re: Embedded modifiers in the regex engine
Date: Sun, 27 Mar 2016 17:23:28 -0700
User-agent: mu4e 0.9.11; emacs 25.0.92.1

Dima Kogan <address@hidden> writes:

> OK. Thank you for the clarification. I'll work on getting these into
> our test suite as the first step in an eventual update to our regex
> engine.

Sorry for the delay. Attached are two patches to import most of glibc's
regex tests to emacs. Assuming these are acceptable to be merged into
our tree, what specifically are people thinking in terms of updates to
the regex engine? Is there a particular implementation that we were
considering?

>From 6601983d8b58d39479bc13741f2780e749a54505 Mon Sep 17 00:00:00 2001
From: Dima Kogan <address@hidden>
Date: Fri, 11 Mar 2016 18:18:14 -0800
Subject: [PATCH 1/2] New regex tests imported from glibc 2.21

* test/src/regex/regex-resources: new tests
---
 test/src/regex/regex-resources/BOOST.tests |  829 ++++++++++
 test/src/regex/regex-resources/PCRE.tests  | 2386 ++++++++++++++++++++++++++++
 test/src/regex/regex-resources/PTESTS      |  341 ++++
 test/src/regex/regex-resources/TESTS       |  167 ++
 4 files changed, 3723 insertions(+)
 create mode 100644 test/src/regex/regex-resources/BOOST.tests
 create mode 100644 test/src/regex/regex-resources/PCRE.tests
 create mode 100644 test/src/regex/regex-resources/PTESTS
 create mode 100644 test/src/regex/regex-resources/TESTS

diff --git a/test/src/regex/regex-resources/BOOST.tests 
b/test/src/regex/regex-resources/BOOST.tests
new file mode 100644
index 0000000..98fd3b6
--- /dev/null
+++ b/test/src/regex/regex-resources/BOOST.tests
@@ -0,0 +1,829 @@
+; 
+; 
+; this file contains a script of tests to run through regress.exe
+;
+; comments start with a semicolon and proceed to the end of the line
+;
+; changes to regular expression compile flags start with a "-" as the first
+; non-whitespace character and consist of a list of the printable names
+; of the flags, for example "match_default"
+;
+; Other lines contain a test to perform using the current flag status
+; the first token contains the expression to compile, the second the string
+; to match it against. If the second string is "!" then the expression should
+; not compile, that is the first string is an invalid regular expression.
+; This is then followed by a list of integers that specify what should match,
+; each pair represents the starting and ending positions of a subexpression
+; starting with the zeroth subexpression (the whole match).
+; A value of -1 indicates that the subexpression should not take part in the
+; match at all, if the first value is -1 then no part of the expression should
+; match the string.
+;
+; Tests taken from BOOST testsuite and adapted to glibc regex.
+;
+; Boost Software License - Version 1.0 - August 17th, 2003
+;
+; Permission is hereby granted, free of charge, to any person or organization
+; obtaining a copy of the software and accompanying documentation covered by
+; this license (the "Software") to use, reproduce, display, distribute,
+; execute, and transmit the Software, and to prepare derivative works of the
+; Software, and to permit third-parties to whom the Software is furnished to
+; do so, all subject to the following:
+;
+; The copyright notices in the Software and this entire statement, including
+; the above license grant, this restriction and the following disclaimer,
+; must be included in all copies of the Software, in whole or in part, and
+; all derivative works of the Software, unless such copies or derivative
+; works are solely in the form of machine-executable object code generated by
+; a source language processor.
+;
+; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+; IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+; FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+; SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+; FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+; ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+; DEALINGS IN THE SOFTWARE.
+;
+
+- match_default normal REG_EXTENDED
+
+;
+; try some really simple literals:
+a a 0 1
+Z Z 0 1
+Z aaa -1 -1
+Z xxxxZZxxx 4 5
+
+; and some simple brackets:
+(a) zzzaazz 3 4 3 4
+() zzz 0 0 0 0
+() "" 0 0 0 0
+( !
+) ) 0 1
+(aa !
+aa) baa)b 1 4
+a b -1 -1
+\(\) () 0 2
+\(a\) (a) 0 3
+\() () 0 2
+(\) !
+p(a)rameter ABCparameterXYZ 3 12 4 5
+[pq](a)rameter ABCparameterXYZ 3 12 4 5
+
+; now try escaped brackets:
+- match_default bk_parens REG_BASIC
+\(a\) zzzaazz 3 4 3 4
+\(\) zzz 0 0 0 0
+\(\) "" 0 0 0 0
+\( !
+\) !
+\(aa !
+aa\) !
+() () 0 2
+(a) (a) 0 3
+(\) !
+\() !
+
+; now move on to "." wildcards
+- match_default normal REG_EXTENDED REG_STARTEND
+. a 0 1
+. \n 0 1
+. \r 0 1
+. \0 0 1
+
+;
+; now move on to the repetion ops,
+; starting with operator *
+- match_default normal REG_EXTENDED
+a* b 0 0
+ab* a 0 1
+ab* ab 0 2
+ab* sssabbbbbbsss 3 10
+ab*c* a 0 1
+ab*c* abbb 0 4
+ab*c* accc 0 4
+ab*c* abbcc 0 5
+*a !
+\<* !
+\>* !
+\n* \n\n 0 2
+\** ** 0 2
+\* * 0 1
+
+; now try operator +
+ab+ a -1 -1
+ab+ ab 0 2
+ab+ sssabbbbbbsss 3 10
+ab+c+ a -1 -1
+ab+c+ abbb -1 -1
+ab+c+ accc -1 -1
+ab+c+ abbcc 0 5
++a !
+\<+ !
+\>+ !
+\n+ \n\n 0 2
+\+ + 0 1
+\+ ++ 0 1
+\++ ++ 0 2
+
+; now try operator ?
+- match_default normal REG_EXTENDED
+a? b 0 0
+ab? a 0 1
+ab? ab 0 2
+ab? sssabbbbbbsss 3 5
+ab?c? a 0 1
+ab?c? abbb 0 2
+ab?c? accc 0 2
+ab?c? abcc 0 3
+?a !
+\<? !
+\>? !
+\n? \n\n 0 1
+\? ? 0 1
+\? ?? 0 1
+\?? ?? 0 1
+
+; now try operator {}
+- match_default normal REG_EXTENDED
+a{2} a -1 -1
+a{2} aa 0 2
+a{2} aaa 0 2
+a{2,} a -1 -1
+a{2,} aa 0 2
+a{2,} aaaaa 0 5
+a{2,4} a -1 -1
+a{2,4} aa 0 2
+a{2,4} aaa 0 3
+a{2,4} aaaa 0 4
+a{2,4} aaaaa 0 4
+a{} !
+a{2 !
+a} a} 0 2
+\{\} {} 0 2
+
+- match_default normal REG_BASIC
+a\{2\} a -1 -1
+a\{2\} aa 0 2
+a\{2\} aaa 0 2
+a\{2,\} a -1 -1
+a\{2,\} aa 0 2
+a\{2,\} aaaaa 0 5
+a\{2,4\} a -1 -1
+a\{2,4\} aa 0 2
+a\{2,4\} aaa 0 3
+a\{2,4\} aaaa 0 4
+a\{2,4\} aaaaa 0 4
+{} {} 0 2
+
+; now test the alternation operator |
+- match_default normal REG_EXTENDED
+a|b a 0 1
+a|b b 0 1
+a(b|c) ab 0 2 1 2
+a(b|c) ac 0 2 1 2
+a(b|c) ad -1 -1 -1 -1
+a\| a| 0 2
+
+; now test the set operator []
+- match_default normal REG_EXTENDED
+; try some literals first
+[abc] a 0 1
+[abc] b 0 1
+[abc] c 0 1
+[abc] d -1 -1
+[^bcd] a 0 1
+[^bcd] b -1 -1
+[^bcd] d -1 -1
+[^bcd] e 0 1
+a[b]c abc 0 3
+a[ab]c abc 0 3
+a[^ab]c adc 0 3
+a[]b]c a]c 0 3
+a[[b]c a[c 0 3
+a[-b]c a-c 0 3
+a[^]b]c adc 0 3
+a[^-b]c adc 0 3
+a[b-]c a-c 0 3
+a[b !
+a[] !
+
+; then some ranges
+[b-e] a -1 -1
+[b-e] b 0 1
+[b-e] e 0 1
+[b-e] f -1 -1
+[^b-e] a 0 1
+[^b-e] b -1 -1
+[^b-e] e -1 -1
+[^b-e] f 0 1
+a[1-3]c a2c 0 3
+a[3-1]c !
+a[1-3-5]c !
+a[1- !
+
+; and some classes
+a[[:alpha:]]c abc 0 3
+a[[:unknown:]]c !
+a[[: !
+a[[:alpha !
+a[[:alpha:] !
+a[[:alpha,:] !
+a[[:]:]]b !
+a[[:-:]]b !
+a[[:alph:]] !
+a[[:alphabet:]] !
+[[:alnum:]]+ address@hidden 3 6
+[[:alpha:]]+ address@hidden 3 5
+[[:blank:]]+ "a  \tb" 1 4
+[[:cntrl:]]+ a\n\tb 1 3
+[[:digit:]]+ a019b 1 4
+[[:graph:]]+ " a%b " 1 4
+[[:lower:]]+ AabC 1 3
+; This test fails with STLPort, disable for now as this is a corner case 
anyway...
+;[[:print:]]+ "\na b\n" 1 4
+[[:punct:]]+ " %-&\t" 1 4
+[[:space:]]+ "a \n\t\rb" 1 5
+[[:upper:]]+ aBCd 1 3
+[[:xdigit:]]+ p0f3Cx 1 5
+
+; now test flag settings:
+- escape_in_lists REG_NO_POSIX_TEST
+[\n] \n 0 1
+- REG_NO_POSIX_TEST
+
+; line anchors
+- match_default normal REG_EXTENDED
+^ab ab 0 2
+^ab xxabxx -1 -1
+ab$ ab 0 2
+ab$ abxx -1 -1
+- match_default match_not_bol match_not_eol normal REG_EXTENDED REG_NOTBOL 
REG_NOTEOL
+^ab ab -1 -1
+^ab xxabxx -1 -1
+ab$ ab -1 -1
+ab$ abxx -1 -1
+
+; back references
+- match_default normal REG_PERL
+a(b)\2c        !
+a(b\1)c        !
+a(b*)c\1d abbcbbd 0 7 1 3
+a(b*)c\1d abbcbd -1 -1
+a(b*)c\1d abbcbbbd -1 -1
+^(.)\1 abc -1 -1
+a([bc])\1d abcdabbd    4 8 5 6
+; strictly speaking this is at best ambiguous, at worst wrong, this is what 
most
+; re implimentations will match though.
+a(([bc])\2)*d abbccd 0 6 3 5 3 4
+
+a(([bc])\2)*d abbcbd -1 -1
+a((b)*\2)*d abbbd 0 5 1 4 2 3
+; perl only:
+(ab*)[ab]*\1 ababaaa 0 7 0 1
+(a)\1bcd aabcd 0 5 0 1
+(a)\1bc*d aabcd 0 5 0 1
+(a)\1bc*d aabd 0 4 0 1
+(a)\1bc*d aabcccd 0 7 0 1
+(a)\1bc*[ce]d aabcccd 0 7 0 1
+^(a)\1b(c)*cd$ aabcccd 0 7 0 1 4 5
+
+; posix only: 
+- match_default extended REG_EXTENDED
+(ab*)[ab]*\1 ababaaa 0 7 0 1
+
+;
+; word operators:
+\w a 0 1
+\w z 0 1
+\w A 0 1
+\w Z 0 1
+\w _ 0 1
+\w } -1 -1
+\w ` -1 -1
+\w [ -1 -1
+\w @ -1 -1
+; non-word:
+\W a -1 -1
+\W z -1 -1
+\W A -1 -1
+\W Z -1 -1
+\W _ -1 -1
+\W } 0 1
+\W ` 0 1
+\W [ 0 1
+\W @ 0 1
+; word start:
+\<abcd "  abcd" 2 6
+\<ab cab -1 -1
+\<ab "\nab" 1 3
+\<tag ::tag 2 5
+;word end:
+abc\> abc 0 3
+abc\> abcd -1 -1
+abc\> abc\n 0 3
+abc\> abc:: 0 3
+; word boundary:
+\babcd "  abcd" 2 6
+\bab cab -1 -1
+\bab "\nab" 1 3
+\btag ::tag 2 5
+abc\b abc 0 3
+abc\b abcd -1 -1
+abc\b abc\n 0 3
+abc\b abc:: 0 3
+; within word:
+\B ab 1 1
+a\Bb ab 0 2
+a\B ab 0 1
+a\B a -1 -1
+a\B "a " -1 -1
+
+;
+; buffer operators:
+\`abc abc 0 3
+\`abc \nabc -1 -1
+\`abc " abc" -1 -1
+abc\' abc 0 3
+abc\' abc\n -1 -1
+abc\' "abc " -1 -1
+
+;
+; now follows various complex expressions designed to try and bust the matcher:
+a(((b)))c abc 0 3 1 2 1 2 1 2
+a(b|(c))d abd 0 3 1 2 -1 -1
+a(b|(c))d acd 0 3 1 2 1 2
+a(b*|c)d abbd 0 4 1 3
+; just gotta have one DFA-buster, of course
+a[ab]{20} aaaaabaaaabaaaabaaaab 0 21
+; and an inline expansion in case somebody gets tricky
+a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab]
 aaaaabaaaabaaaabaaaab 0 21
+; and in case somebody just slips in an NFA...
+a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab](wee|week)(knights|night)
 aaaaabaaaabaaaabaaaabweeknights 0 31 21 24 24 31
+; one really big one
+1234567890123456789012345678901234567890123456789012345678901234567890 
a1234567890123456789012345678901234567890123456789012345678901234567890b 1 71
+; fish for problems as brackets go past 8
+[ab][cd][ef][gh][ij][kl][mn] xacegikmoq 1 8
+[ab][cd][ef][gh][ij][kl][mn][op] xacegikmoq 1 9
+[ab][cd][ef][gh][ij][kl][mn][op][qr] xacegikmoqy 1 10
+[ab][cd][ef][gh][ij][kl][mn][op][q] xacegikmoqy 1 10
+; and as parenthesis go past 9:
+(a)(b)(c)(d)(e)(f)(g)(h) zabcdefghi 1 9 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9
+(a)(b)(c)(d)(e)(f)(g)(h)(i) zabcdefghij 1 10 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 
10
+(a)(b)(c)(d)(e)(f)(g)(h)(i)(j) zabcdefghijk 1 11 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 
9 9 10 10 11
+(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k) zabcdefghijkl 1 12 1 2 2 3 3 4 4 5 5 6 6 7 7 
8 8 9 9 10 10 11 11 12
+(a)d|(b)c abc 1 3 -1 -1 1 2
+_+((www)|(ftp)|(mailto)):_* "_wwwnocolon _mailto:"; 12 20 13 19 -1 -1 -1 -1 13 
19
+
+; subtleties of matching
+;a(b)?c\1d acd 0 3 -1 -1
+; POSIX is about the following test:
+a(b)?c\1d acd -1 -1 -1 -1
+a(b?c)+d accd 0 4 2 3
+(wee|week)(knights|night) weeknights 0 10 0 3 3 10
+.* abc 0 3
+a(b|(c))d abd 0 3 1 2 -1 -1
+a(b|(c))d acd 0 3 1 2 1 2
+a(b*|c|e)d abbd 0 4 1 3
+a(b*|c|e)d acd 0 3 1 2
+a(b*|c|e)d ad 0 2 1 1
+a(b?)c abc 0 3 1 2
+a(b?)c ac 0 2 1 1
+a(b+)c abc 0 3 1 2
+a(b+)c abbbc 0 5 1 4 
+a(b*)c ac 0 2 1 1 
+(a|ab)(bc([de]+)f|cde) abcdef 0 6 0 1 1 6 3 5
+a([bc]?)c abc 0 3 1 2
+a([bc]?)c ac 0 2 1 1 
+a([bc]+)c abc 0 3 1 2
+a([bc]+)c abcc 0 4 1 3
+a([bc]+)bc abcbc 0 5 1 3
+a(bb+|b)b abb 0 3 1 2
+a(bbb+|bb+|b)b abb 0 3 1 2
+a(bbb+|bb+|b)b abbb 0 4 1 3
+a(bbb+|bb+|b)bb abbb 0 4 1 2
+(.*).* abcdef 0 6 0 6
+(a*)* bc 0 0 0 0
+xyx*xz xyxxxxyxxxz 5 11
+
+; do we get the right subexpression when it is used more than once?
+a(b|c)*d ad 0 2 -1 -1
+a(b|c)*d abcd 0 4 2 3
+a(b|c)+d abd 0 3 1 2
+a(b|c)+d abcd 0 4 2 3
+a(b|c?)+d ad 0 2 1 1
+a(b|c){0,0}d ad 0 2 -1 -1
+a(b|c){0,1}d ad 0 2 -1 -1
+a(b|c){0,1}d abd 0 3 1 2
+a(b|c){0,2}d ad 0 2 -1 -1
+a(b|c){0,2}d abcd 0 4 2 3
+a(b|c){0,}d ad 0 2 -1 -1
+a(b|c){0,}d abcd 0 4 2 3
+a(b|c){1,1}d abd 0 3 1 2
+a(b|c){1,2}d abd 0 3 1 2
+a(b|c){1,2}d abcd 0 4 2 3
+a(b|c){1,}d abd 0 3 1 2
+a(b|c){1,}d abcd 0 4 2 3
+a(b|c){2,2}d acbd 0 4 2 3
+a(b|c){2,2}d abcd 0 4 2 3
+a(b|c){2,4}d abcd 0 4 2 3
+a(b|c){2,4}d abcbd 0 5 3 4
+a(b|c){2,4}d abcbcd 0 6 4 5
+a(b|c){2,}d abcd 0 4 2 3
+a(b|c){2,}d abcbd 0 5 3 4
+; perl only: these conflict with the POSIX test below
+;a(b|c?)+d abcd 0 4 3 3
+;a(b+|((c)*))+d abd 0 3 2 2 2 2 -1 -1
+;a(b+|((c)*))+d abcd 0 4 3 3 3 3 2 3
+
+; posix only:
+- match_default extended REG_EXTENDED REG_STARTEND
+
+a(b|c?)+d abcd 0 4 2 3
+a(b|((c)*))+d abcd 0 4 2 3 2 3 2 3
+a(b+|((c)*))+d abd 0 3 1 2 -1 -1 -1 -1
+a(b+|((c)*))+d abcd 0 4 2 3 2 3 2 3
+a(b|((c)*))+d ad 0 2 1 1 1 1 -1 -1
+a(b|((c)*))*d abcd 0 4 2 3 2 3 2 3
+a(b+|((c)*))*d abd 0 3 1 2 -1 -1 -1 -1
+a(b+|((c)*))*d abcd 0 4 2 3 2 3 2 3
+a(b|((c)*))*d ad 0 2 1 1 1 1 -1 -1
+
+- match_default normal REG_PERL
+; try to match C++ syntax elements:
+; line comment:
+//[^\n]* "++i //here is a line comment\n" 4 28
+; block comment:
+/\*([^*]|\*+[^*/])*\*+/ "/* here is a block comment */" 0 29 26 27
+/\*([^*]|\*+[^*/])*\*+/ "/**/" 0 4 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/***/" 0 5 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/****/" 0 6 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/*****/" 0 7 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/*****/*/" 0 7 -1 -1
+; preprossor directives:
+^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol" 0 19 -1 -1
+^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) #x" 0 25 
-1 -1
+; perl only:
+^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) \\  \r\n  
foo();\\\r\n   printf(#x);" 0 53 30 42
+; literals:
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFF                
                                        0 4             0 4             0 4     
-1 -1   -1 -1   -1 -1   -1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 35                  
                                                0 2     0 2             -1 -1   
0 2     -1 -1   -1 -1   -1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFu               
                                                0 5             0 4             
0 4     -1 -1   -1 -1   -1 -1   -1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFL               
                                                0 5             0 4             
0 4     -1 -1   4 5     -1 -1   -1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 
0xFFFFFFFFFFFFFFFFuint64                    0 24    0 18    0 18    -1 -1   19 
24   19 24   22 24
+; strings:
+'([^\\']|\\.)*' '\\x3A' 0 6 4 5
+'([^\\']|\\.)*' '\\'' 0 4 1 3
+'([^\\']|\\.)*' '\\n' 0 4 1 3
+
+; finally try some case insensitive matches:
+- match_default normal REG_EXTENDED REG_ICASE
+; upper and lower have no meaning here so they fail, however these
+; may compile with other libraries...
+;[[:lower:]] !
+;[[:upper:]] !
address@hidden|\} address@hidden|\} 0 72
+
+; known and suspected bugs:
+- match_default normal REG_EXTENDED
+\( ( 0 1
+\) ) 0 1
+\$ $ 0 1
+\^ ^ 0 1
+\. . 0 1
+\* * 0 1
+\+ + 0 1
+\? ? 0 1
+\[ [ 0 1
+\] ] 0 1
+\| | 0 1
+\\ \\ 0 1
+# # 0 1
+\# # 0 1
+a- a- 0 2
+\- - 0 1
+\{ { 0 1
+\} } 0 1
+0 0 0 1
+1 1 0 1
+9 9 0 1
+b b 0 1
+B B 0 1
+< < 0 1
+> > 0 1
+w w 0 1
+W W 0 1
+` ` 0 1
+' ' 0 1
+\n \n 0 1
+, , 0 1
+a a 0 1
+f f 0 1
+n n 0 1
+r r 0 1
+t t 0 1
+v v 0 1
+c c 0 1
+x x 0 1
+: : 0 1
+(\.[[:alnum:]]+){2} "w.a.b " 1 5 3 5
+
+- match_default normal REG_EXTENDED REG_ICASE
+a A 0 1
+A a 0 1
+[abc]+ abcABC 0 6
+[ABC]+ abcABC 0 6
+[a-z]+ abcABC 0 6
+[A-Z]+ abzANZ 0 6
+[a-Z]+ abzABZ 0 6
+[A-z]+ abzABZ 0 6
+[[:lower:]]+ abyzABYZ 0 8
+[[:upper:]]+ abzABZ 0 6
+[[:alpha:]]+ abyzABYZ 0 8
+[[:alnum:]]+ 09abyzABYZ 0 10
+
+; word start:
+\<abcd "  abcd" 2 6
+\<ab cab -1 -1
+\<ab "\nab" 1 3
+\<tag ::tag 2 5
+;word end:
+abc\> abc 0 3
+abc\> abcd -1 -1
+abc\> abc\n 0 3
+abc\> abc:: 0 3
+
+; collating elements and rewritten set code:
+- match_default normal REG_EXTENDED REG_STARTEND
+;[[.zero.]] 0 0 1
+;[[.one.]] 1 0 1
+;[[.two.]] 2 0 1
+;[[.three.]] 3 0 1
+[[.a.]] baa 1 2
+;[[.right-curly-bracket.]] } 0 1
+;[[.NUL.]] \0 0 1
+[[:<:]z] !
+[a[:>:]] !
+[[=a=]] a 0 1
+;[[=right-curly-bracket=]] } 0 1
+- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
+[[.A.]] A 0 1
+[[.A.]] a 0 1
+[[.A.]-b]+ AaBb 0 4
+[A-[.b.]]+ AaBb 0 4
+[[.a.]-B]+ AaBb 0 4
+[a-[.B.]]+ AaBb 0 4
+- match_default normal REG_EXTENDED REG_STARTEND
+[[.a.]-c]+ abcd 0 3
+[a-[.c.]]+ abcd 0 3
+[[:alpha:]-a] !
+[a-[:alpha:]] !
+
+; try mutli-character ligatures:
+;[[.ae.]] ae 0 2
+;[[.ae.]] aE -1 -1
+;[[.AE.]] AE 0 2
+;[[.Ae.]] Ae 0 2
+;[[.ae.]-b] a -1 -1
+;[[.ae.]-b] b 0 1
+;[[.ae.]-b] ae 0 2
+;[a-[.ae.]] a 0 1
+;[a-[.ae.]] b -1 -1
+;[a-[.ae.]] ae 0 2
+- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
+;[[.ae.]] AE 0 2
+;[[.ae.]] Ae 0 2
+;[[.AE.]] Ae 0 2
+;[[.Ae.]] aE 0 2
+;[[.AE.]-B] a -1 -1
+;[[.Ae.]-b] b 0 1
+;[[.Ae.]-b] B 0 1
+;[[.ae.]-b] AE 0 2
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_NO_POSIX_TEST
+\s+ "ab   ab" 2 5
+\S+ "  abc  " 2 5
+
+- match_default normal REG_EXTENDED REG_STARTEND
+\`abc abc 0 3
+\`abc aabc -1 -1
+abc\' abc 0 3
+abc\' abcd -1 -1
+abc\' abc\n\n -1 -1
+abc\' abc 0 3
+
+; extended repeat checking to exercise new algorithms:
+ab.*xy abxy_ 0 4
+ab.*xy ab_xy_ 0 5
+ab.*xy abxy 0 4
+ab.*xy ab_xy 0 5
+ab.* ab 0 2
+ab.* ab__ 0 4
+
+ab.{2,5}xy ab__xy_ 0 6
+ab.{2,5}xy ab____xy_ 0 8
+ab.{2,5}xy ab_____xy_ 0 9
+ab.{2,5}xy ab__xy 0 6
+ab.{2,5}xy ab_____xy 0 9
+ab.{2,5} ab__ 0 4
+ab.{2,5} ab_______ 0 7
+ab.{2,5}xy ab______xy -1 -1
+ab.{2,5}xy ab_xy -1 -1
+
+ab.*?xy abxy_ 0 4
+ab.*?xy ab_xy_ 0 5
+ab.*?xy abxy 0 4
+ab.*?xy ab_xy 0 5
+ab.*? ab 0 2
+ab.*? ab__ 0 4
+
+ab.{2,5}?xy ab__xy_ 0 6
+ab.{2,5}?xy ab____xy_ 0 8
+ab.{2,5}?xy ab_____xy_ 0 9
+ab.{2,5}?xy ab__xy 0 6
+ab.{2,5}?xy ab_____xy 0 9
+ab.{2,5}? ab__ 0 4
+ab.{2,5}? ab_______ 0 7
+ab.{2,5}?xy ab______xy -1 -1
+ab.{2,5}xy ab_xy -1 -1
+
+; again but with slower algorithm variant:
+- match_default REG_EXTENDED
+; now again for single character repeats:
+
+ab_*xy abxy_ 0 4
+ab_*xy ab_xy_ 0 5
+ab_*xy abxy 0 4
+ab_*xy ab_xy 0 5
+ab_* ab 0 2
+ab_* ab__ 0 4
+
+ab_{2,5}xy ab__xy_ 0 6
+ab_{2,5}xy ab____xy_ 0 8
+ab_{2,5}xy ab_____xy_ 0 9
+ab_{2,5}xy ab__xy 0 6
+ab_{2,5}xy ab_____xy 0 9
+ab_{2,5} ab__ 0 4
+ab_{2,5} ab_______ 0 7
+ab_{2,5}xy ab______xy -1 -1
+ab_{2,5}xy ab_xy -1 -1
+
+ab_*?xy abxy_ 0 4
+ab_*?xy ab_xy_ 0 5
+ab_*?xy abxy 0 4
+ab_*?xy ab_xy 0 5
+ab_*? ab 0 2
+ab_*? ab__ 0 4
+
+ab_{2,5}?xy ab__xy_ 0 6
+ab_{2,5}?xy ab____xy_ 0 8
+ab_{2,5}?xy ab_____xy_ 0 9
+ab_{2,5}?xy ab__xy 0 6
+ab_{2,5}?xy ab_____xy 0 9
+ab_{2,5}? ab__ 0 4
+ab_{2,5}? ab_______ 0 7
+ab_{2,5}?xy ab______xy -1 -1
+ab_{2,5}xy ab_xy -1 -1
+
+; and again for sets:
+ab[_,;]*xy abxy_ 0 4
+ab[_,;]*xy ab_xy_ 0 5
+ab[_,;]*xy abxy 0 4
+ab[_,;]*xy ab_xy 0 5
+ab[_,;]* ab 0 2
+ab[_,;]* ab__ 0 4
+
+ab[_,;]{2,5}xy ab__xy_ 0 6
+ab[_,;]{2,5}xy ab____xy_ 0 8
+ab[_,;]{2,5}xy ab_____xy_ 0 9
+ab[_,;]{2,5}xy ab__xy 0 6
+ab[_,;]{2,5}xy ab_____xy 0 9
+ab[_,;]{2,5} ab__ 0 4
+ab[_,;]{2,5} ab_______ 0 7
+ab[_,;]{2,5}xy ab______xy -1 -1
+ab[_,;]{2,5}xy ab_xy -1 -1
+
+ab[_,;]*?xy abxy_ 0 4
+ab[_,;]*?xy ab_xy_ 0 5
+ab[_,;]*?xy abxy 0 4
+ab[_,;]*?xy ab_xy 0 5
+ab[_,;]*? ab 0 2
+ab[_,;]*? ab__ 0 4
+
+ab[_,;]{2,5}?xy ab__xy_ 0 6
+ab[_,;]{2,5}?xy ab____xy_ 0 8
+ab[_,;]{2,5}?xy ab_____xy_ 0 9
+ab[_,;]{2,5}?xy ab__xy 0 6
+ab[_,;]{2,5}?xy ab_____xy 0 9
+ab[_,;]{2,5}? ab__ 0 4
+ab[_,;]{2,5}? ab_______ 0 7
+ab[_,;]{2,5}?xy ab______xy -1 -1
+ab[_,;]{2,5}xy ab_xy -1 -1
+
+; and again for tricky sets with digraphs:
+;ab[_[.ae.]]*xy abxy_ 0 4
+;ab[_[.ae.]]*xy ab_xy_ 0 5
+;ab[_[.ae.]]*xy abxy 0 4
+;ab[_[.ae.]]*xy ab_xy 0 5
+;ab[_[.ae.]]* ab 0 2
+;ab[_[.ae.]]* ab__ 0 4
+
+;ab[_[.ae.]]{2,5}xy ab__xy_ 0 6
+;ab[_[.ae.]]{2,5}xy ab____xy_ 0 8
+;ab[_[.ae.]]{2,5}xy ab_____xy_ 0 9
+;ab[_[.ae.]]{2,5}xy ab__xy 0 6
+;ab[_[.ae.]]{2,5}xy ab_____xy 0 9
+;ab[_[.ae.]]{2,5} ab__ 0 4
+;ab[_[.ae.]]{2,5} ab_______ 0 7
+;ab[_[.ae.]]{2,5}xy ab______xy -1 -1
+;ab[_[.ae.]]{2,5}xy ab_xy -1 -1
+
+;ab[_[.ae.]]*?xy abxy_ 0 4
+;ab[_[.ae.]]*?xy ab_xy_ 0 5
+;ab[_[.ae.]]*?xy abxy 0 4
+;ab[_[.ae.]]*?xy ab_xy 0 5
+;ab[_[.ae.]]*? ab 0 2
+;ab[_[.ae.]]*? ab__ 0 2
+
+;ab[_[.ae.]]{2,5}?xy ab__xy_ 0 6
+;ab[_[.ae.]]{2,5}?xy ab____xy_ 0 8
+;ab[_[.ae.]]{2,5}?xy ab_____xy_ 0 9
+;ab[_[.ae.]]{2,5}?xy ab__xy 0 6
+;ab[_[.ae.]]{2,5}?xy ab_____xy 0 9
+;ab[_[.ae.]]{2,5}? ab__ 0 4
+;ab[_[.ae.]]{2,5}? ab_______ 0 4
+;ab[_[.ae.]]{2,5}?xy ab______xy -1 -1
+;ab[_[.ae.]]{2,5}xy ab_xy -1 -1
+
+; new bugs detected in spring 2003:
+- normal match_continuous REG_NO_POSIX_TEST
+b abc 1 2
+
+() abc 0 0 0 0
+^() abc 0 0 0 0
+^()+ abc 0 0 0 0
+^(){1} abc 0 0 0 0
+^(){2} abc 0 0 0 0
+^((){2}) abc 0 0 0 0 0 0
+() "" 0 0 0 0
+()\1 "" 0 0 0 0
+()\1 a 0 0 0 0
+a()\1b ab 0 2 1 1
+a()b\1 ab 0 2 1 1
+
+; subtleties of matching with no sub-expressions marked
+- normal match_nosubs REG_NO_POSIX_TEST
+a(b?c)+d accd 0 4 
+(wee|week)(knights|night) weeknights 0 10 
+.* abc 0 3
+a(b|(c))d abd 0 3 
+a(b|(c))d acd 0 3
+a(b*|c|e)d abbd 0 4
+a(b*|c|e)d acd 0 3 
+a(b*|c|e)d ad 0 2
+a(b?)c abc 0 3
+a(b?)c ac 0 2
+a(b+)c abc 0 3
+a(b+)c abbbc 0 5
+a(b*)c ac 0 2
+(a|ab)(bc([de]+)f|cde) abcdef 0 6
+a([bc]?)c abc 0 3
+a([bc]?)c ac 0 2
+a([bc]+)c abc 0 3
+a([bc]+)c abcc 0 4
+a([bc]+)bc abcbc 0 5
+a(bb+|b)b abb 0 3
+a(bbb+|bb+|b)b abb 0 3
+a(bbb+|bb+|b)b abbb 0 4
+a(bbb+|bb+|b)bb abbb 0 4
+(.*).* abcdef 0 6
+(a*)* bc 0 0
+
+- normal nosubs REG_NO_POSIX_TEST
+a(b?c)+d accd 0 4 
+(wee|week)(knights|night) weeknights 0 10 
+.* abc 0 3
+a(b|(c))d abd 0 3 
+a(b|(c))d acd 0 3
+a(b*|c|e)d abbd 0 4
+a(b*|c|e)d acd 0 3 
+a(b*|c|e)d ad 0 2
+a(b?)c abc 0 3
+a(b?)c ac 0 2
+a(b+)c abc 0 3
+a(b+)c abbbc 0 5
+a(b*)c ac 0 2
+(a|ab)(bc([de]+)f|cde) abcdef 0 6
+a([bc]?)c abc 0 3
+a([bc]?)c ac 0 2
+a([bc]+)c abc 0 3
+a([bc]+)c abcc 0 4
+a([bc]+)bc abcbc 0 5
+a(bb+|b)b abb 0 3
+a(bbb+|bb+|b)b abb 0 3
+a(bbb+|bb+|b)b abbb 0 4
+a(bbb+|bb+|b)bb abbb 0 4
+(.*).* abcdef 0 6
+(a*)* bc 0 0
+
diff --git a/test/src/regex/regex-resources/PCRE.tests 
b/test/src/regex/regex-resources/PCRE.tests
new file mode 100644
index 0000000..0fb9cad
--- /dev/null
+++ b/test/src/regex/regex-resources/PCRE.tests
@@ -0,0 +1,2386 @@
+# PCRE version 4.4 21-August-2003
+
+# Tests taken from PCRE and modified to suit glibc regex.
+#
+# PCRE LICENCE
+# ------------
+#
+# PCRE is a library of functions to support regular expressions whose syntax
+# and semantics are as close as possible to those of the Perl 5 language.
+#
+# Written by: Philip Hazel <address@hidden>
+#
+# University of Cambridge Computing Service,
+# Cambridge, England. Phone: +44 1223 334714.
+#
+# Copyright (c) 1997-2003 University of Cambridge
+#
+# Permission is granted to anyone to use this software for any purpose on any
+# computer system, and to redistribute it freely, subject to the following
+# restrictions:
+#
+# 1. This software is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# 2. The origin of this software must not be misrepresented, either by
+#    explicit claim or by omission. In practice, this means that if you use
+#    PCRE in software that you distribute to others, commercially or
+#    otherwise, you must put a sentence like this
+#
+#      Regular expression support is provided by the PCRE library package,
+#      which is open source software, written by Philip Hazel, and copyright
+#      by the University of Cambridge, England.
+#
+#    somewhere reasonably visible in your documentation and in any relevant
+#    files or online help data or similar. A reference to the ftp site for
+#    the source, that is, to
+#
+#      ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/
+#
+#    should also be given in the documentation. However, this condition is not
+#    intended to apply to whole chains of software. If package A includes PCRE,
+#    it must acknowledge it, but if package B is software that includes package
+#    A, the condition is not imposed on package B (unless it uses PCRE
+#    independently).
+#
+# 3. Altered versions must be plainly marked as such, and must not be
+#    misrepresented as being the original software.
+#
+# 4. If PCRE is embedded in any software that is released under the GNU
+#   General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL),
+#   then the terms of that licence shall supersede any condition above with
+#   which it is incompatible.
+#
+# The documentation for PCRE, supplied in the "doc" directory, is distributed
+# under the same terms as the software itself.
+#
+# End
+#
+
+/the quick brown fox/
+    the quick brown fox
+ 0: the quick brown fox
+    The quick brown FOX
+No match
+    What do you know about the quick brown fox?
+ 0: the quick brown fox
+    What do you know about THE QUICK BROWN FOX?
+No match
+
+/The quick brown fox/i
+    the quick brown fox
+ 0: the quick brown fox
+    The quick brown FOX
+ 0: The quick brown FOX
+    What do you know about the quick brown fox?
+ 0: the quick brown fox
+    What do you know about THE QUICK BROWN FOX?
+ 0: THE QUICK BROWN FOX
+
+/a*abc?xyz+pqr{3}ab{2,}xy{4,5}pq{0,6}AB{0,}zz/
+    abxyzpqrrrabbxyyyypqAzz
+ 0: abxyzpqrrrabbxyyyypqAzz
+    abxyzpqrrrabbxyyyypqAzz
+ 0: abxyzpqrrrabbxyyyypqAzz
+    aabxyzpqrrrabbxyyyypqAzz
+ 0: aabxyzpqrrrabbxyyyypqAzz
+    aaabxyzpqrrrabbxyyyypqAzz
+ 0: aaabxyzpqrrrabbxyyyypqAzz
+    aaaabxyzpqrrrabbxyyyypqAzz
+ 0: aaaabxyzpqrrrabbxyyyypqAzz
+    abcxyzpqrrrabbxyyyypqAzz
+ 0: abcxyzpqrrrabbxyyyypqAzz
+    aabcxyzpqrrrabbxyyyypqAzz
+ 0: aabcxyzpqrrrabbxyyyypqAzz
+    aaabcxyzpqrrrabbxyyyypAzz
+ 0: aaabcxyzpqrrrabbxyyyypAzz
+    aaabcxyzpqrrrabbxyyyypqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqAzz
+    aaabcxyzpqrrrabbxyyyypqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqAzz
+    aaabcxyzpqrrrabbxyyyypqqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqqAzz
+    aaabcxyzpqrrrabbxyyyypqqqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqqqAzz
+    aaabcxyzpqrrrabbxyyyypqqqqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqqqqAzz
+    aaabcxyzpqrrrabbxyyyypqqqqqqAzz
+ 0: aaabcxyzpqrrrabbxyyyypqqqqqqAzz
+    aaaabcxyzpqrrrabbxyyyypqAzz
+ 0: aaaabcxyzpqrrrabbxyyyypqAzz
+    abxyzzpqrrrabbxyyyypqAzz
+ 0: abxyzzpqrrrabbxyyyypqAzz
+    aabxyzzzpqrrrabbxyyyypqAzz
+ 0: aabxyzzzpqrrrabbxyyyypqAzz
+    aaabxyzzzzpqrrrabbxyyyypqAzz
+ 0: aaabxyzzzzpqrrrabbxyyyypqAzz
+    aaaabxyzzzzpqrrrabbxyyyypqAzz
+ 0: aaaabxyzzzzpqrrrabbxyyyypqAzz
+    abcxyzzpqrrrabbxyyyypqAzz
+ 0: abcxyzzpqrrrabbxyyyypqAzz
+    aabcxyzzzpqrrrabbxyyyypqAzz
+ 0: aabcxyzzzpqrrrabbxyyyypqAzz
+    aaabcxyzzzzpqrrrabbxyyyypqAzz
+ 0: aaabcxyzzzzpqrrrabbxyyyypqAzz
+    aaaabcxyzzzzpqrrrabbxyyyypqAzz
+ 0: aaaabcxyzzzzpqrrrabbxyyyypqAzz
+    aaaabcxyzzzzpqrrrabbbxyyyypqAzz
+ 0: aaaabcxyzzzzpqrrrabbbxyyyypqAzz
+    aaaabcxyzzzzpqrrrabbbxyyyyypqAzz
+ 0: aaaabcxyzzzzpqrrrabbbxyyyyypqAzz
+    aaabcxyzpqrrrabbxyyyypABzz
+ 0: aaabcxyzpqrrrabbxyyyypABzz
+    aaabcxyzpqrrrabbxyyyypABBzz
+ 0: aaabcxyzpqrrrabbxyyyypABBzz
+    >>>aaabxyzpqrrrabbxyyyypqAzz
+ 0: aaabxyzpqrrrabbxyyyypqAzz
+    >aaaabxyzpqrrrabbxyyyypqAzz
+ 0: aaaabxyzpqrrrabbxyyyypqAzz
+    >>>>abcxyzpqrrrabbxyyyypqAzz
+ 0: abcxyzpqrrrabbxyyyypqAzz
+    *** Failers
+No match
+    abxyzpqrrabbxyyyypqAzz
+No match
+    abxyzpqrrrrabbxyyyypqAzz
+No match
+    abxyzpqrrrabxyyyypqAzz
+No match
+    aaaabcxyzzzzpqrrrabbbxyyyyyypqAzz
+No match
+    aaaabcxyzzzzpqrrrabbbxyyypqAzz
+No match
+    aaabcxyzpqrrrabbxyyyypqqqqqqqAzz
+No match
+
+/^(abc){1,2}zz/
+    abczz
+ 0: abczz
+ 1: abc
+    abcabczz
+ 0: abcabczz
+ 1: abc
+    *** Failers
+No match
+    zz
+No match
+    abcabcabczz
+No match
+    >>abczz
+No match
+
+/^(b+|a){1,2}c/
+    bc
+ 0: bc
+ 1: b
+    bbc
+ 0: bbc
+ 1: bb
+    bbbc
+ 0: bbbc
+ 1: bbb
+    bac
+ 0: bac
+ 1: a
+    bbac
+ 0: bbac
+ 1: a
+    aac
+ 0: aac
+ 1: a
+    abbbbbbbbbbbc
+ 0: abbbbbbbbbbbc
+ 1: bbbbbbbbbbb
+    bbbbbbbbbbbac
+ 0: bbbbbbbbbbbac
+ 1: a
+    *** Failers
+No match
+    aaac
+No match
+    abbbbbbbbbbbac
+No match
+
+/^[]cde]/
+    ]thing
+ 0: ]
+    cthing
+ 0: c
+    dthing
+ 0: d
+    ething
+ 0: e
+    *** Failers
+No match
+    athing
+No match
+    fthing
+No match
+
+/^[^]cde]/
+    athing
+ 0: a
+    fthing
+ 0: f
+    *** Failers
+ 0: *
+    ]thing
+No match
+    cthing
+No match
+    dthing
+No match
+    ething
+No match
+
+/^[0-9]+$/
+    0
+ 0: 0
+    1
+ 0: 1
+    2
+ 0: 2
+    3
+ 0: 3
+    4
+ 0: 4
+    5
+ 0: 5
+    6
+ 0: 6
+    7
+ 0: 7
+    8
+ 0: 8
+    9
+ 0: 9
+    10
+ 0: 10
+    100
+ 0: 100
+    *** Failers
+No match
+    abc
+No match
+
+/^.*nter/
+    enter
+ 0: enter
+    inter
+ 0: inter
+    uponter
+ 0: uponter
+
+/^xxx[0-9]+$/
+    xxx0
+ 0: xxx0
+    xxx1234
+ 0: xxx1234
+    *** Failers
+No match
+    xxx
+No match
+
+/^.+[0-9][0-9][0-9]$/
+    x123
+ 0: x123
+    xx123
+ 0: xx123
+    123456
+ 0: 123456
+    *** Failers
+No match
+    123
+No match
+    x1234
+ 0: x1234
+
+/^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/
+    abc!pqr=apquxz.ixr.zzz.ac.uk
+ 0: abc!pqr=apquxz.ixr.zzz.ac.uk
+ 1: abc
+ 2: pqr
+    *** Failers
+No match
+    !pqr=apquxz.ixr.zzz.ac.uk
+No match
+    abc!=apquxz.ixr.zzz.ac.uk
+No match
+    abc!pqr=apquxz:ixr.zzz.ac.uk
+No match
+    abc!pqr=apquxz.ixr.zzz.ac.ukk
+No match
+
+/:/
+    Well, we need a colon: somewhere
+ 0: :
+    *** Fail if we don't
+No match
+
+/([0-9a-f:]+)$/i
+    0abc
+ 0: 0abc
+ 1: 0abc
+    abc
+ 0: abc
+ 1: abc
+    fed
+ 0: fed
+ 1: fed
+    E
+ 0: E
+ 1: E
+    ::
+ 0: ::
+ 1: ::
+    5f03:12C0::932e
+ 0: 5f03:12C0::932e
+ 1: 5f03:12C0::932e
+    fed def
+ 0: def
+ 1: def
+    Any old stuff
+ 0: ff
+ 1: ff
+    *** Failers
+No match
+    0zzz
+No match
+    gzzz
+No match
+    Any old rubbish
+No match
+
+/^.*\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$/
+    .1.2.3
+ 0: .1.2.3
+ 1: 1
+ 2: 2
+ 3: 3
+    A.12.123.0
+ 0: A.12.123.0
+ 1: 12
+ 2: 123
+ 3: 0
+    *** Failers
+No match
+    .1.2.3333
+No match
+    1.2.3
+No match
+    1234.2.3
+No match
+
+/^([0-9]+)\s+IN\s+SOA\s+(\S+)\s+(\S+)\s*\(\s*$/
+    1 IN SOA non-sp1 non-sp2(
+ 0: 1 IN SOA non-sp1 non-sp2(
+ 1: 1
+ 2: non-sp1
+ 3: non-sp2
+    1    IN    SOA    non-sp1    non-sp2   (
+ 0: 1    IN    SOA    non-sp1    non-sp2   (
+ 1: 1
+ 2: non-sp1
+ 3: non-sp2
+    *** Failers
+No match
+    1IN SOA non-sp1 non-sp2(
+No match
+
+/^[a-zA-Z0-9][a-zA-Z0-9-]*(\.[a-zA-Z0-9][a-zA-z0-9-]*)*\.$/
+    a.
+ 0: a.
+    Z.
+ 0: Z.
+    2.
+ 0: 2.
+    ab-c.pq-r.
+ 0: ab-c.pq-r.
+ 1: .pq-r
+    sxk.zzz.ac.uk.
+ 0: sxk.zzz.ac.uk.
+ 1: .uk
+    x-.y-.
+ 0: x-.y-.
+ 1: .y-
+    *** Failers
+No match
+    -abc.peq.
+No match
+
+/^\*\.[a-z]([a-z0-9-]*[a-z0-9]+)?(\.[a-z]([a-z0-9-]*[a-z0-9]+)?)*$/
+    *.a
+ 0: *.a
+    *.b0-a
+ 0: *.b0-a
+ 1: 0-a
+    *.c3-b.c
+ 0: *.c3-b.c
+ 1: 3-b
+ 2: .c
+    *.c-a.b-c
+ 0: *.c-a.b-c
+ 1: -a
+ 2: .b-c
+ 3: -c
+    *** Failers
+No match
+    *.0
+No match
+    *.a-
+No match
+    *.a-b.c-
+No match
+    *.c-a.0-c
+No match
+
+/^[0-9a-f](\.[0-9a-f])*$/i
+    a.b.c.d
+ 0: a.b.c.d
+ 1: .d
+    A.B.C.D
+ 0: A.B.C.D
+ 1: .D
+    a.b.c.1.2.3.C
+ 0: a.b.c.1.2.3.C
+ 1: .C
+
+/^".*"\s*(;.*)?$/
+    "1234"
+ 0: "1234"
+    "abcd" ;
+ 0: "abcd" ;
+ 1: ;
+    "" ; rhubarb
+ 0: "" ; rhubarb
+ 1: ; rhubarb
+    *** Failers
+No match
+    "1234" : things
+No match
+
+/^(a(b(c)))(d(e(f)))(h(i(j)))(k(l(m)))$/
+    abcdefhijklm
+ 0: abcdefhijklm
+ 1: abc
+ 2: bc
+ 3: c
+ 4: def
+ 5: ef
+ 6: f
+ 7: hij
+ 8: ij
+ 9: j
+10: klm
+11: lm
+12: m
+
+/^a*\w/
+    z
+ 0: z
+    az
+ 0: az
+    aaaz
+ 0: aaaz
+    a
+ 0: a
+    aa
+ 0: aa
+    aaaa
+ 0: aaaa
+    a+
+ 0: a
+    aa+
+ 0: aa
+
+/^a+\w/
+    az
+ 0: az
+    aaaz
+ 0: aaaz
+    aa
+ 0: aa
+    aaaa
+ 0: aaaa
+    aa+
+ 0: aa
+
+/^[0-9]{8}\w{2,}/
+    1234567890
+ 0: 1234567890
+    12345678ab
+ 0: 12345678ab
+    12345678__
+ 0: 12345678__
+    *** Failers
+No match
+    1234567
+No match
+
+/^[aeiou0-9]{4,5}$/
+    uoie
+ 0: uoie
+    1234
+ 0: 1234
+    12345
+ 0: 12345
+    aaaaa
+ 0: aaaaa
+    *** Failers
+No match
+    123456
+No match
+
+/\`(abc|def)=(\1){2,3}\'/
+    abc=abcabc
+ 0: abc=abcabc
+ 1: abc
+ 2: abc
+    def=defdefdef
+ 0: def=defdefdef
+ 1: def
+ 2: def
+    *** Failers
+No match
+    abc=defdef
+No match
+
+/(cat(a(ract|tonic)|erpillar)) \1()2(3)/
+    cataract cataract23
+ 0: cataract cataract23
+ 1: cataract
+ 2: aract
+ 3: ract
+ 4: 
+ 5: 3
+    catatonic catatonic23
+ 0: catatonic catatonic23
+ 1: catatonic
+ 2: atonic
+ 3: tonic
+ 4: 
+ 5: 3
+    caterpillar caterpillar23
+ 0: caterpillar caterpillar23
+ 1: caterpillar
+ 2: erpillar
+ 3: <unset>
+ 4: 
+ 5: 3
+
+
+/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] 
+[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
+    From abcd  Mon Sep 01 12:33:02 1997
+ 0: From abcd  Mon Sep 01 12:33
+ 1: abcd
+
+/^From\s+\S+\s+([a-zA-Z]{3}\s+){2}[0-9]{1,2}\s+[0-9][0-9]:[0-9][0-9]/
+    From abcd  Mon Sep 01 12:33:02 1997
+ 0: From abcd  Mon Sep 01 12:33
+ 1: Sep 
+    From abcd  Mon Sep  1 12:33:02 1997
+ 0: From abcd  Mon Sep  1 12:33
+ 1: Sep  
+    *** Failers
+No match
+    From abcd  Sep 01 12:33:02 1997
+No match
+
+/^(a)\1{2,3}(.)/
+    aaab
+ 0: aaab
+ 1: a
+ 2: b
+    aaaab
+ 0: aaaab
+ 1: a
+ 2: b
+    aaaaab
+ 0: aaaaa
+ 1: a
+ 2: a
+    aaaaaab
+ 0: aaaaa
+ 1: a
+ 2: a
+
+/^[ab]{1,3}(ab*|b)/
+    aabbbbb
+ 0: aabbbbb
+ 1: abbbbb
+
+/^(cow|)\1(bell)/
+    cowcowbell
+ 0: cowcowbell
+ 1: cow
+ 2: bell
+    bell
+ 0: bell
+ 1: 
+ 2: bell
+    *** Failers
+No match
+    cowbell
+No match
+
+/^(a|)\1+b/
+    aab
+ 0: aab
+ 1: a
+    aaaab
+ 0: aaaab
+ 1: a
+    b
+ 0: b
+ 1: 
+    *** Failers
+No match
+    ab
+No match
+
+/^(a|)\1{2}b/
+    aaab
+ 0: aaab
+ 1: a
+    b
+ 0: b
+ 1: 
+    *** Failers
+No match
+    ab
+No match
+    aab
+No match
+    aaaab
+No match
+
+/^(a|)\1{2,3}b/
+    aaab
+ 0: aaab
+ 1: a
+    aaaab
+ 0: aaaab
+ 1: a
+    b
+ 0: b
+ 1: 
+    *** Failers
+No match
+    ab
+No match
+    aab
+No match
+    aaaaab
+No match
+
+/ab{1,3}bc/
+    abbbbc
+ 0: abbbbc
+    abbbc
+ 0: abbbc
+    abbc
+ 0: abbc
+    *** Failers
+No match
+    abc
+No match
+    abbbbbc
+No match
+
+/([^.]*)\.([^:]*):[T ]+(.*)/
+    track1.title:TBlah blah blah
+ 0: track1.title:TBlah blah blah
+ 1: track1
+ 2: title
+ 3: Blah blah blah
+
+/([^.]*)\.([^:]*):[T ]+(.*)/i
+    track1.title:TBlah blah blah
+ 0: track1.title:TBlah blah blah
+ 1: track1
+ 2: title
+ 3: Blah blah blah
+
+/([^.]*)\.([^:]*):[t ]+(.*)/i
+    track1.title:TBlah blah blah
+ 0: track1.title:TBlah blah blah
+ 1: track1
+ 2: title
+ 3: Blah blah blah
+
+/^abc$/
+    abc
+ 0: abc
+    *** Failers
+No match
+
+/[-az]+/
+    az-
+ 0: az-
+    *** Failers
+ 0: a
+    b
+No match
+
+/[az-]+/
+    za-
+ 0: za-
+    *** Failers
+ 0: a
+    b
+No match
+
+/[a-z]+/
+    abcdxyz
+ 0: abcdxyz
+
+/[0-9-]+/
+    12-34
+ 0: 12-34
+    *** Failers
+No match
+    aaa
+No match
+
+/(abc)\1/i
+    abcabc
+ 0: abcabc
+ 1: abc
+    ABCabc
+ 0: ABCabc
+ 1: ABC
+    abcABC
+ 0: abcABC
+ 1: abc
+
+/a{0}bc/
+    bc
+ 0: bc
+
+/^([^a])([^b])([^c]*)([^d]{3,4})/
+    baNOTccccd
+ 0: baNOTcccc
+ 1: b
+ 2: a
+ 3: NOT
+ 4: cccc
+    baNOTcccd
+ 0: baNOTccc
+ 1: b
+ 2: a
+ 3: NOT
+ 4: ccc
+    baNOTccd
+ 0: baNOTcc
+ 1: b
+ 2: a
+ 3: NO
+ 4: Tcc
+    bacccd
+ 0: baccc
+ 1: b
+ 2: a
+ 3: 
+ 4: ccc
+    *** Failers
+ 0: *** Failers
+ 1: *
+ 2: *
+ 3: * Fail
+ 4: ers
+    anything
+No match
+    baccd
+No match
+
+/[^a]/
+    Abc
+ 0: A
+
+/[^a]/i
+    Abc 
+ 0: b
+
+/[^a]+/
+    AAAaAbc
+ 0: AAA
+
+/[^a]+/i
+    AAAaAbc
+ 0: bc
+
+/[^k]$/
+    abc
+ 0: c
+    *** Failers
+ 0: s
+    abk
+No match
+
+/[^k]{2,3}$/
+    abc
+ 0: abc
+    kbc
+ 0: bc
+    kabc
+ 0: abc
+    *** Failers
+ 0: ers
+    abk
+No match
+    akb
+No match
+    akk 
+No match
+
+/^[0-9]{8,address@hidden/
+    address@hidden
+ 0: address@hidden
+    address@hidden
+ 0: address@hidden
+    *** Failers
+No match
+    address@hidden
+No match
+    address@hidden       
+No match
+
+/(a)\1{8,}/
+    aaaaaaaaa
+ 0: aaaaaaaaa
+ 1: a
+    aaaaaaaaaa
+ 0: aaaaaaaaaa
+ 1: a
+    *** Failers
+No match
+    aaaaaaa   
+No match
+
+/[^a]/
+    aaaabcd
+ 0: b
+    aaAabcd 
+ 0: A
+
+/[^a]/i
+    aaaabcd
+ 0: b
+    aaAabcd 
+ 0: b
+
+/[^az]/
+    aaaabcd
+ 0: b
+    aaAabcd 
+ 0: A
+
+/[^az]/i
+    aaaabcd
+ 0: b
+    aaAabcd 
+ 0: b
+
+/P[^*]TAIRE[^*]{1,6}LL/
+    xxxxxxxxxxxPSTAIREISLLxxxxxxxxx
+ 0: PSTAIREISLL
+
+/P[^*]TAIRE[^*]{1,}LL/
+    xxxxxxxxxxxPSTAIREISLLxxxxxxxxx
+ 0: PSTAIREISLL
+
+/(\.[0-9][0-9][1-9]?)[0-9]+/
+    1.230003938
+ 0: .230003938
+ 1: .23
+    1.875000282   
+ 0: .875000282
+ 1: .875
+    1.235  
+ 0: .235
+ 1: .23
+                  
+/\b(foo)\s+(\w+)/i
+    Food is on the foo table
+ 0: foo table
+ 1: foo
+ 2: table
+    
+/foo(.*)bar/
+    The food is under the bar in the barn.
+ 0: food is under the bar in the bar
+ 1: d is under the bar in the 
+    
+/(.*)([0-9]*)/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 53147
+ 2: 
+    
+/(.*)([0-9]+)/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 5314
+ 2: 7
+
+/(.*)([0-9]+)$/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 5314
+ 2: 7
+
+/(.*)\b([0-9]+)$/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 
+ 2: 53147
+
+/(.*[^0-9])([0-9]+)$/
+    I have 2 numbers: 53147
+ 0: I have 2 numbers: 53147
+ 1: I have 2 numbers: 
+ 2: 53147
+
+/[[:digit:]][[:digit:]]\/[[:digit:]][[:digit:]]\/[[:digit:]][[:digit:]][[:digit:]][[:digit:]]/
+    01/01/2000
+ 0: 01/01/2000
+
+/^(a){0,0}/
+    bcd
+ 0: 
+    abc
+ 0: 
+    aab     
+ 0: 
+
+/^(a){0,1}/
+    bcd
+ 0: 
+    abc
+ 0: a
+ 1: a
+    aab  
+ 0: a
+ 1: a
+
+/^(a){0,2}/
+    bcd
+ 0: 
+    abc
+ 0: a
+ 1: a
+    aab  
+ 0: aa
+ 1: a
+
+/^(a){0,3}/
+    bcd
+ 0: 
+    abc
+ 0: a
+ 1: a
+    aab
+ 0: aa
+ 1: a
+    aaa   
+ 0: aaa
+ 1: a
+
+/^(a){0,}/
+    bcd
+ 0: 
+    abc
+ 0: a
+ 1: a
+    aab
+ 0: aa
+ 1: a
+    aaa
+ 0: aaa
+ 1: a
+    aaaaaaaa    
+ 0: aaaaaaaa
+ 1: a
+
+/^(a){1,1}/
+    bcd
+No match
+    abc
+ 0: a
+ 1: a
+    aab  
+ 0: a
+ 1: a
+
+/^(a){1,2}/
+    bcd
+No match
+    abc
+ 0: a
+ 1: a
+    aab  
+ 0: aa
+ 1: a
+
+/^(a){1,3}/
+    bcd
+No match
+    abc
+ 0: a
+ 1: a
+    aab
+ 0: aa
+ 1: a
+    aaa   
+ 0: aaa
+ 1: a
+
+/^(a){1,}/
+    bcd
+No match
+    abc
+ 0: a
+ 1: a
+    aab
+ 0: aa
+ 1: a
+    aaa
+ 0: aaa
+ 1: a
+    aaaaaaaa    
+ 0: aaaaaaaa
+ 1: a
+
+/^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/
+    123456654321
+ 0: 123456654321
+
+/^[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]]/
+    123456654321 
+ 0: 123456654321
+
+/^[abc]{12}/
+    abcabcabcabc
+ 0: abcabcabcabc
+    
+/^[a-c]{12}/
+    abcabcabcabc
+ 0: abcabcabcabc
+    
+/^(a|b|c){12}/
+    abcabcabcabc 
+ 0: abcabcabcabc
+ 1: c
+
+/^[abcdefghijklmnopqrstuvwxy0123456789]/
+    n
+ 0: n
+    *** Failers 
+No match
+    z 
+No match
+
+/abcde{0,0}/
+    abcd
+ 0: abcd
+    *** Failers
+No match
+    abce  
+No match
+
+/ab[cd]{0,0}e/
+    abe
+ 0: abe
+    *** Failers
+No match
+    abcde 
+No match
+    
+/ab(c){0,0}d/
+    abd
+ 0: abd
+    *** Failers
+No match
+    abcd   
+No match
+
+/a(b*)/
+    a
+ 0: a
+ 1: 
+    ab
+ 0: ab
+ 1: b
+    abbbb
+ 0: abbbb
+ 1: bbbb
+    *** Failers
+ 0: a
+ 1: 
+    bbbbb    
+No match
+    
+/ab[0-9]{0}e/
+    abe
+ 0: abe
+    *** Failers
+No match
+    ab1e   
+No match
+    
+/(A|B)*CD/
+    CD 
+ 0: CD
+
+/(AB)*\1/
+    ABABAB
+ 0: ABABAB
+ 1: AB
+    
+/([0-9]+)(\w)/
+    12345a
+ 0: 12345a
+ 1: 12345
+ 2: a
+    12345+ 
+ 0: 12345
+ 1: 1234
+ 2: 5
+
+/(abc|)+/
+    abc
+ 0: abc
+ 1: abc
+    abcabc
+ 0: abcabc
+ 1: abc
+    abcabcabc
+ 0: abcabcabc
+ 1: abc
+    xyz      
+ 0: 
+ 1: 
+
+/([a]*)*/
+    a
+ 0: a
+ 1: a
+    aaaaa 
+ 0: aaaaa
+ 1: aaaaa
+
+/([ab]*)*/
+    a
+ 0: a
+ 1: a
+    b
+ 0: b
+ 1: b
+    ababab
+ 0: ababab
+ 1: ababab
+    aaaabcde
+ 0: aaaab
+ 1: aaaab
+    bbbb    
+ 0: bbbb
+ 1: bbbb
+
+/([^a]*)*/
+    b
+ 0: b
+ 1: b
+    bbbb
+ 0: bbbb
+ 1: bbbb
+    aaa   
+ 0: 
+
+/([^ab]*)*/
+    cccc
+ 0: cccc
+ 1: cccc
+    abab  
+ 0: 
+
+/abc/
+    abc
+ 0: abc
+    xabcy
+ 0: abc
+    ababc
+ 0: abc
+    *** Failers
+No match
+    xbc
+No match
+    axc
+No match
+    abx
+No match
+
+/ab*c/
+    abc
+ 0: abc
+
+/ab*bc/
+    abc
+ 0: abc
+    abbc
+ 0: abbc
+    abbbbc
+ 0: abbbbc
+
+/.{1}/
+    abbbbc
+ 0: a
+
+/.{3,4}/
+    abbbbc
+ 0: abbb
+
+/ab{0,}bc/
+    abbbbc
+ 0: abbbbc
+
+/ab+bc/
+    abbc
+ 0: abbc
+    *** Failers
+No match
+    abc
+No match
+    abq
+No match
+
+/ab+bc/
+    abbbbc
+ 0: abbbbc
+
+/ab{1,}bc/
+    abbbbc
+ 0: abbbbc
+
+/ab{1,3}bc/
+    abbbbc
+ 0: abbbbc
+
+/ab{3,4}bc/
+    abbbbc
+ 0: abbbbc
+
+/ab{4,5}bc/
+    *** Failers
+No match
+    abq
+No match
+    abbbbc
+No match
+
+/ab?bc/
+    abbc
+ 0: abbc
+    abc
+ 0: abc
+
+/ab{0,1}bc/
+    abc
+ 0: abc
+
+/ab?c/
+    abc
+ 0: abc
+
+/ab{0,1}c/
+    abc
+ 0: abc
+
+/^abc$/
+    abc
+ 0: abc
+    *** Failers
+No match
+    abbbbc
+No match
+    abcc
+No match
+
+/^abc/
+    abcc
+ 0: abc
+
+/abc$/
+    aabc
+ 0: abc
+    *** Failers
+No match
+    aabc
+ 0: abc
+    aabcd
+No match
+
+/^/
+    abc
+ 0: 
+
+/$/
+    abc
+ 0: 
+
+/a.c/
+    abc
+ 0: abc
+    axc
+ 0: axc
+
+/a.*c/
+    axyzc
+ 0: axyzc
+
+/a[bc]d/
+    abd
+ 0: abd
+    *** Failers
+No match
+    axyzd
+No match
+    abc
+No match
+
+/a[b-d]e/
+    ace
+ 0: ace
+
+/a[b-d]/
+    aac
+ 0: ac
+
+/a[-b]/
+    a-
+ 0: a-
+
+/a[b-]/
+    a-
+ 0: a-
+
+/a[]]b/
+    a]b
+ 0: a]b
+
+/a[^bc]d/
+    aed
+ 0: aed
+    *** Failers
+No match
+    abd
+No match
+    abd
+No match
+
+/a[^-b]c/
+    adc
+ 0: adc
+
+/a[^]b]c/
+    adc
+ 0: adc
+    *** Failers
+No match
+    a-c
+ 0: a-c
+    a]c
+No match
+
+/\ba\b/
+    a-
+ 0: a
+    -a
+ 0: a
+    -a-
+ 0: a
+
+/\by\b/
+    *** Failers
+No match
+    xy
+No match
+    yz
+No match
+    xyz
+No match
+
+/\Ba\B/
+    *** Failers
+ 0: a
+    a-
+No match
+    -a
+No match
+    -a-
+No match
+
+/\By\b/
+    xy
+ 0: y
+
+/\by\B/
+    yz
+ 0: y
+
+/\By\B/
+    xyz
+ 0: y
+
+/\w/
+    a
+ 0: a
+
+/\W/
+    -
+ 0: -
+    *** Failers
+ 0: *
+    -
+ 0: -
+    a
+No match
+
+/a\sb/
+    a b
+ 0: a b
+
+/a\Sb/
+    a-b
+ 0: a-b
+    *** Failers
+No match
+    a-b
+ 0: a-b
+    a b
+No match
+
+/[0-9]/
+    1
+ 0: 1
+
+/[^0-9]/
+    -
+ 0: -
+    *** Failers
+ 0: *
+    -
+ 0: -
+    1
+No match
+
+/ab|cd/
+    abc
+ 0: ab
+    abcd
+ 0: ab
+
+/()ef/
+    def
+ 0: ef
+ 1: 
+
+/a\(b/
+    a(b
+ 0: a(b
+
+/a\(*b/
+    ab
+ 0: ab
+    a((b
+ 0: a((b
+
+/((a))/
+    abc
+ 0: a
+ 1: a
+ 2: a
+
+/(a)b(c)/
+    abc
+ 0: abc
+ 1: a
+ 2: c
+
+/a+b+c/
+    aabbabc
+ 0: abc
+
+/a{1,}b{1,}c/
+    aabbabc
+ 0: abc
+
+/(a+|b)*/
+    ab
+ 0: ab
+ 1: b
+
+/(a+|b){0,}/
+    ab
+ 0: ab
+ 1: b
+
+/(a+|b)+/
+    ab
+ 0: ab
+ 1: b
+
+/(a+|b){1,}/
+    ab
+ 0: ab
+ 1: b
+
+/(a+|b)?/
+    ab
+ 0: a
+ 1: a
+
+/(a+|b){0,1}/
+    ab
+ 0: a
+ 1: a
+
+/[^ab]*/
+    cde
+ 0: cde
+
+/abc/
+    *** Failers
+No match
+    b
+No match
+    
+
+/a*/
+    
+
+/([abc])*d/
+    abbbcd
+ 0: abbbcd
+ 1: c
+
+/([abc])*bcd/
+    abcd
+ 0: abcd
+ 1: a
+
+/a|b|c|d|e/
+    e
+ 0: e
+
+/(a|b|c|d|e)f/
+    ef
+ 0: ef
+ 1: e
+
+/abcd*efg/
+    abcdefg
+ 0: abcdefg
+
+/ab*/
+    xabyabbbz
+ 0: ab
+    xayabbbz
+ 0: a
+
+/(ab|cd)e/
+    abcde
+ 0: cde
+ 1: cd
+
+/[abhgefdc]ij/
+    hij
+ 0: hij
+
+/(abc|)ef/
+    abcdef
+ 0: ef
+ 1: 
+
+/(a|b)c*d/
+    abcd
+ 0: bcd
+ 1: b
+
+/(ab|ab*)bc/
+    abc
+ 0: abc
+ 1: a
+
+/a([bc]*)c*/
+    abc
+ 0: abc
+ 1: bc
+
+/a([bc]*)(c*d)/
+    abcd
+ 0: abcd
+ 1: bc
+ 2: d
+
+/a([bc]+)(c*d)/
+    abcd
+ 0: abcd
+ 1: bc
+ 2: d
+
+/a([bc]*)(c+d)/
+    abcd
+ 0: abcd
+ 1: b
+ 2: cd
+
+/a[bcd]*dcdcde/
+    adcdcde
+ 0: adcdcde
+
+/a[bcd]+dcdcde/
+    *** Failers
+No match
+    abcde
+No match
+    adcdcde
+No match
+
+/(ab|a)b*c/
+    abc
+ 0: abc
+ 1: ab
+
+/((a)(b)c)(d)/
+    abcd
+ 0: abcd
+ 1: abc
+ 2: a
+ 3: b
+ 4: d
+
+/[a-zA-Z_][a-zA-Z0-9_]*/
+    alpha
+ 0: alpha
+
+/^a(bc+|b[eh])g|.h$/
+    abh
+ 0: bh
+
+/(bc+d$|ef*g.|h?i(j|k))/
+    effgz
+ 0: effgz
+ 1: effgz
+    ij
+ 0: ij
+ 1: ij
+ 2: j
+    reffgz
+ 0: effgz
+ 1: effgz
+    *** Failers
+No match
+    effg
+No match
+    bcdd
+No match
+
+/((((((((((a))))))))))/
+    a
+ 0: a
+ 1: a
+ 2: a
+ 3: a
+ 4: a
+ 5: a
+ 6: a
+ 7: a
+ 8: a
+ 9: a
+10: a
+
+/((((((((((a))))))))))\9/
+    aa
+ 0: aa
+ 1: a
+ 2: a
+ 3: a
+ 4: a
+ 5: a
+ 6: a
+ 7: a
+ 8: a
+ 9: a
+10: a
+
+/(((((((((a)))))))))/
+    a
+ 0: a
+ 1: a
+ 2: a
+ 3: a
+ 4: a
+ 5: a
+ 6: a
+ 7: a
+ 8: a
+ 9: a
+
+/multiple words of text/
+    *** Failers
+No match
+    aa
+No match
+    uh-uh
+No match
+
+/multiple words/
+    multiple words, yeah
+ 0: multiple words
+
+/(.*)c(.*)/
+    abcde
+ 0: abcde
+ 1: ab
+ 2: de
+
+/\((.*), (.*)\)/
+    (a, b)
+ 0: (a, b)
+ 1: a
+ 2: b
+
+/abcd/
+    abcd
+ 0: abcd
+
+/a(bc)d/
+    abcd
+ 0: abcd
+ 1: bc
+
+/a[-]?c/
+    ac
+ 0: ac
+
+/(abc)\1/
+    abcabc
+ 0: abcabc
+ 1: abc
+
+/([a-c]*)\1/
+    abcabc
+ 0: abcabc
+ 1: abc
+
+/(a)|\1/
+    a
+ 0: a
+ 1: a
+    *** Failers
+ 0: a
+ 1: a
+    ab
+ 0: a
+ 1: a
+    x
+No match
+
+/abc/i
+    ABC
+ 0: ABC
+    XABCY
+ 0: ABC
+    ABABC
+ 0: ABC
+    *** Failers
+No match
+    aaxabxbaxbbx
+No match
+    XBC
+No match
+    AXC
+No match
+    ABX
+No match
+
+/ab*c/i
+    ABC
+ 0: ABC
+
+/ab*bc/i
+    ABC
+ 0: ABC
+    ABBC
+ 0: ABBC
+
+/ab+bc/i
+    *** Failers
+No match
+    ABC
+No match
+    ABQ
+No match
+
+/ab+bc/i
+    ABBBBC
+ 0: ABBBBC
+
+/^abc$/i
+    ABC
+ 0: ABC
+    *** Failers
+No match
+    ABBBBC
+No match
+    ABCC
+No match
+
+/^abc/i
+    ABCC
+ 0: ABC
+
+/abc$/i
+    AABC
+ 0: ABC
+
+/^/i
+    ABC
+ 0: 
+
+/$/i
+    ABC
+ 0: 
+
+/a.c/i
+    ABC
+ 0: ABC
+    AXC
+ 0: AXC
+
+/a.*c/i
+    *** Failers
+No match
+    AABC
+ 0: AABC
+    AXYZD
+No match
+
+/a[bc]d/i
+    ABD
+ 0: ABD
+
+/a[b-d]e/i
+    ACE
+ 0: ACE
+    *** Failers
+No match
+    ABC
+No match
+    ABD
+No match
+
+/a[b-d]/i
+    AAC
+ 0: AC
+
+/a[-b]/i
+    A-
+ 0: A-
+
+/a[b-]/i
+    A-
+ 0: A-
+
+/a[]]b/i
+    A]B
+ 0: A]B
+
+/a[^bc]d/i
+    AED
+ 0: AED
+
+/a[^-b]c/i
+    ADC
+ 0: ADC
+    *** Failers
+No match
+    ABD
+No match
+    A-C
+No match
+
+/a[^]b]c/i
+    ADC
+ 0: ADC
+
+/ab|cd/i
+    ABC
+ 0: AB
+    ABCD
+ 0: AB
+
+/()ef/i
+    DEF
+ 0: EF
+ 1: 
+
+/$b/i
+    *** Failers
+No match
+    A]C
+No match
+    B
+No match
+
+/a\(b/i
+    A(B
+ 0: A(B
+
+/a\(*b/i
+    AB
+ 0: AB
+    A((B
+ 0: A((B
+
+/((a))/i
+    ABC
+ 0: A
+ 1: A
+ 2: A
+
+/(a)b(c)/i
+    ABC
+ 0: ABC
+ 1: A
+ 2: C
+
+/a+b+c/i
+    AABBABC
+ 0: ABC
+
+/a{1,}b{1,}c/i
+    AABBABC
+ 0: ABC
+
+/(a+|b)*/i
+    AB
+ 0: AB
+ 1: B
+
+/(a+|b){0,}/i
+    AB
+ 0: AB
+ 1: B
+
+/(a+|b)+/i
+    AB
+ 0: AB
+ 1: B
+
+/(a+|b){1,}/i
+    AB
+ 0: AB
+ 1: B
+
+/(a+|b)?/i
+    AB
+ 0: A
+ 1: A
+
+/(a+|b){0,1}/i
+    AB
+ 0: A
+ 1: A
+
+/[^ab]*/i
+    CDE
+ 0: CDE
+
+/([abc])*d/i
+    ABBBCD
+ 0: ABBBCD
+ 1: C
+
+/([abc])*bcd/i
+    ABCD
+ 0: ABCD
+ 1: A
+
+/a|b|c|d|e/i
+    E
+ 0: E
+
+/(a|b|c|d|e)f/i
+    EF
+ 0: EF
+ 1: E
+
+/abcd*efg/i
+    ABCDEFG
+ 0: ABCDEFG
+
+/ab*/i
+    XABYABBBZ
+ 0: AB
+    XAYABBBZ
+ 0: A
+
+/(ab|cd)e/i
+    ABCDE
+ 0: CDE
+ 1: CD
+
+/[abhgefdc]ij/i
+    HIJ
+ 0: HIJ
+
+/^(ab|cd)e/i
+    ABCDE
+No match
+
+/(abc|)ef/i
+    ABCDEF
+ 0: EF
+ 1: 
+
+/(a|b)c*d/i
+    ABCD
+ 0: BCD
+ 1: B
+
+/(ab|ab*)bc/i
+    ABC
+ 0: ABC
+ 1: A
+
+/a([bc]*)c*/i
+    ABC
+ 0: ABC
+ 1: BC
+
+/a([bc]*)(c*d)/i
+    ABCD
+ 0: ABCD
+ 1: BC
+ 2: D
+
+/a([bc]+)(c*d)/i
+    ABCD
+ 0: ABCD
+ 1: BC
+ 2: D
+
+/a([bc]*)(c+d)/i
+    ABCD
+ 0: ABCD
+ 1: B
+ 2: CD
+
+/a[bcd]*dcdcde/i
+    ADCDCDE
+ 0: ADCDCDE
+
+/a[bcd]+dcdcde/i
+
+/(ab|a)b*c/i
+    ABC
+ 0: ABC
+ 1: AB
+
+/((a)(b)c)(d)/i
+    ABCD
+ 0: ABCD
+ 1: ABC
+ 2: A
+ 3: B
+ 4: D
+
+/[a-zA-Z_][a-zA-Z0-9_]*/i
+    ALPHA
+ 0: ALPHA
+
+/^a(bc+|b[eh])g|.h$/i
+    ABH
+ 0: BH
+
+/(bc+d$|ef*g.|h?i(j|k))/i
+    EFFGZ
+ 0: EFFGZ
+ 1: EFFGZ
+    IJ
+ 0: IJ
+ 1: IJ
+ 2: J
+    REFFGZ
+ 0: EFFGZ
+ 1: EFFGZ
+    *** Failers
+No match
+    ADCDCDE
+No match
+    EFFG
+No match
+    BCDD
+No match
+
+/((((((((((a))))))))))/i
+    A
+ 0: A
+ 1: A
+ 2: A
+ 3: A
+ 4: A
+ 5: A
+ 6: A
+ 7: A
+ 8: A
+ 9: A
+10: A
+
+/((((((((((a))))))))))\9/i
+    AA
+ 0: AA
+ 1: A
+ 2: A
+ 3: A
+ 4: A
+ 5: A
+ 6: A
+ 7: A
+ 8: A
+ 9: A
+10: A
+
+/(((((((((a)))))))))/i
+    A
+ 0: A
+ 1: A
+ 2: A
+ 3: A
+ 4: A
+ 5: A
+ 6: A
+ 7: A
+ 8: A
+ 9: A
+
+/multiple words of text/i
+    *** Failers
+No match
+    AA
+No match
+    UH-UH
+No match
+
+/multiple words/i
+    MULTIPLE WORDS, YEAH
+ 0: MULTIPLE WORDS
+
+/(.*)c(.*)/i
+    ABCDE
+ 0: ABCDE
+ 1: AB
+ 2: DE
+
+/\((.*), (.*)\)/i
+    (A, B)
+ 0: (A, B)
+ 1: A
+ 2: B
+
+/abcd/i
+    ABCD
+ 0: ABCD
+
+/a(bc)d/i
+    ABCD
+ 0: ABCD
+ 1: BC
+
+/a[-]?c/i
+    AC
+ 0: AC
+
+/(abc)\1/i
+    ABCABC
+ 0: ABCABC
+ 1: ABC
+
+/([a-c]*)\1/i
+    ABCABC
+ 0: ABCABC
+ 1: ABC
+
+/((foo)|(bar))*/
+    foobar
+ 0: foobar
+ 1: bar
+ 2: foo
+ 3: bar
+
+/^(.+)?B/
+    AB
+ 0: AB
+ 1: A
+
+/^([^a-z])|(\^)$/
+    .
+ 0: .
+ 1: .
+
+/^[<>]&/
+    <&OUT
+ 0: <&
+
+/^(){3,5}/
+    abc
+ 0: 
+ 1: 
+
+/^(a+)*ax/
+    aax
+ 0: aax
+ 1: a
+
+/^((a|b)+)*ax/
+    aax
+ 0: aax
+ 1: a
+ 2: a
+
+/^((a|bc)+)*ax/
+    aax
+ 0: aax
+ 1: a
+ 2: a
+
+/(a|x)*ab/
+    cab
+ 0: ab
+
+/(a)*ab/
+    cab
+ 0: ab
+
+/(ab)[0-9]\1/i
+    Ab4ab
+ 0: Ab4ab
+ 1: Ab
+    ab4Ab
+ 0: ab4Ab
+ 1: ab
+
+/foo\w*[0-9]{4}baz/
+    foobar1234baz
+ 0: foobar1234baz
+
+/(\w+:)+/
+    one:
+ 0: one:
+ 1: one:
+
+/((\w|:)+::)?(\w+)$/
+    abcd
+ 0: abcd
+ 1: <unset>
+ 2: <unset>
+ 3: abcd
+    xy:z:::abcd
+ 0: xy:z:::abcd
+ 1: xy:z:::
+ 2: :
+ 3: abcd
+
+/^[^bcd]*(c+)/
+    aexycd
+ 0: aexyc
+ 1: c
+
+/(a*)b+/
+    caab
+ 0: aab
+ 1: aa
+
+/((\w|:)+::)?(\w+)$/
+    abcd
+ 0: abcd
+ 1: <unset>
+ 2: <unset>
+ 3: abcd
+    xy:z:::abcd
+ 0: xy:z:::abcd
+ 1: xy:z:::
+ 2: :
+ 3: abcd
+    *** Failers
+ 0: Failers
+ 1: <unset>
+ 2: <unset>
+ 3: Failers
+    abcd:
+No match
+    abcd:
+No match
+
+/^[^bcd]*(c+)/
+    aexycd
+ 0: aexyc
+ 1: c
+
+/((Z)+|A)*/
+    ZABCDEFG
+ 0: ZA
+ 1: A
+ 2: Z
+
+/(Z()|A)*/
+    ZABCDEFG
+ 0: ZA
+ 1: A
+ 2: 
+
+/(Z(())|A)*/
+    ZABCDEFG
+ 0: ZA
+ 1: A
+ 2: 
+ 3: 
+
+/(.*)[0-9]+\1/
+    abc123abc
+ 0: abc123abc
+ 1: abc
+    abc123bc 
+ 0: bc123bc
+ 1: bc
+
+/((.*))[0-9]+\1/
+    abc123abc
+ 0: abc123abc
+ 1: abc
+ 2: abc
+    abc123bc  
+ 0: bc123bc
+ 1: bc
+ 2: bc
+
+/^a{2,5}$/
+    aa
+ 0: aa
+    aaa
+ 0: aaa
+    aaaa
+ 0: aaaa
+    aaaaa
+ 0: aaaaa
+    *** Failers
+No match
+    a
+No match
+    b
+No match
+    aaaaab
+No match
+    aaaaaa
diff --git a/test/src/regex/regex-resources/PTESTS 
b/test/src/regex/regex-resources/PTESTS
new file mode 100644
index 0000000..02b357c
--- /dev/null
+++ b/test/src/regex/regex-resources/PTESTS
@@ -0,0 +1,341 @@
+# 2.8.2  Regular Expression General Requirement
+2¦4¦bb*¦abbbc¦
+2¦2¦bb*¦ababbbc¦
+7¦9¦A#*::¦A:A#:qA::qA#::qA##::q¦
+1¦5¦A#*::¦A##::A#::qA::qA#:q¦
+# 2.8.3.1.2  BRE Special Characters
+# GA108
+2¦2¦\.¦a.c¦
+2¦2¦\[¦a[c¦
+2¦2¦\\¦a\c¦
+2¦2¦\*¦a*c¦
+2¦2¦\^¦a^c¦
+2¦2¦\$¦a$c¦
+7¦11¦X\*Y\*8¦Y*8X*8X*Y*8¦
+# GA109
+2¦2¦[.]¦a.c¦
+2¦2¦[[]¦a[c¦
+-1¦-1¦[[]¦ac¦
+2¦2¦[\]¦a\c¦
+1¦1¦[\a]¦abc¦
+2¦2¦[\.]¦a\.c¦
+2¦2¦[\.]¦a.\c¦
+2¦2¦[*]¦a*c¦
+2¦2¦[$]¦a$c¦
+2¦2¦[X*Y8]¦7*8YX¦
+# GA110
+2¦2¦*¦a*c¦
+3¦4¦*a¦*b*a*c¦
+1¦5¦**9=¦***9=9¦
+# GA111
+1¦1¦^*¦*bc¦
+-1¦-1¦^*¦a*c¦
+-1¦-1¦^*¦^*ab¦
+1¦5¦^**9=¦***9=¦
+-1¦-1¦^*5<*9¦5<9*5<*9¦
+# GA112
+2¦3¦\(*b\)¦a*b¦
+-1¦-1¦\(*b\)¦ac¦
+1¦6¦A\(**9\)=¦A***9=79¦
+# GA113(1)
+1¦3¦\(^*ab\)¦*ab¦
+-1¦-1¦\(^*ab\)¦^*ab¦
+-1¦-1¦\(^*b\)¦a*b¦
+-1¦-1¦\(^*b\)¦^*b¦
+### GA113(2)                   GNU regex implements GA113(1)
+##-1¦-1¦\(^*ab\)¦*ab¦
+##-1¦-1¦\(^*ab\)¦^*ab¦
+##1¦1¦\(^*b\)¦b¦
+##1¦3¦\(^*b\)¦^^b¦
+# GA114
+1¦3¦a^b¦a^b¦
+1¦3¦a\^b¦a^b¦
+1¦1¦^^¦^bc¦
+2¦2¦\^¦a^c¦
+1¦1¦[c^b]¦^abc¦
+1¦1¦[\^ab]¦^ab¦
+2¦2¦[\^ab]¦c\d¦
+-1¦-1¦[^^]¦^¦
+1¦3¦\(a^b\)¦a^b¦
+1¦3¦\(a\^b\)¦a^b¦
+2¦2¦\(\^\)¦a^b¦
+# GA115
+3¦3¦$$¦ab$¦
+-1¦-1¦$$¦$ab¦
+2¦3¦$c¦a$c¦
+2¦2¦[$]¦a$c¦
+1¦2¦\$a¦$a¦
+3¦3¦\$$¦ab$¦
+2¦6¦A\([34]$[34]\)B¦XA4$3BY¦
+# 2.8.3.1.3  Periods in BREs
+# GA116
+1¦1¦.¦abc¦
+-1¦-1¦.ab¦abc¦
+1¦3¦ab.¦abc¦
+1¦3¦a.b¦a,b¦
+-1¦-1¦.......¦PqRs6¦
+1¦7¦.......¦PqRs6T8¦
+# 2.8.3.2  RE Bracket Expression
+# GA118
+2¦2¦[abc]¦xbyz¦
+-1¦-1¦[abc]¦xyz¦
+2¦2¦[abc]¦xbay¦
+# GA119
+2¦2¦[^a]¦abc¦
+4¦4¦[^]cd]¦cd]ef¦
+2¦2¦[^abc]¦axyz¦
+-1¦-1¦[^abc]¦abc¦
+3¦3¦[^[.a.]b]¦abc¦
+3¦3¦[^[=a=]b]¦abc¦
+2¦2¦[^-ac]¦abcde-¦
+2¦2¦[^ac-]¦abcde-¦
+3¦3¦[^a-b]¦abcde¦
+3¦3¦[^a-bd-e]¦dec¦
+2¦2¦[^---]¦-ab¦
+16¦16¦[^a-zA-Z0-9]¦pqrstVWXYZ23579#¦
+# GA120(1)
+3¦3¦[]a]¦cd]ef¦
+1¦1¦[]-a]¦a_b¦
+3¦3¦[][.-.]-0]¦ab0-]¦
+1¦1¦[]^a-z]¦string¦
+# GA120(2)
+4¦4¦[^]cd]¦cd]ef¦
+0¦0¦[^]]*¦]]]]]]]]X¦
+0¦0¦[^]]*¦]]]]]]]]¦
+9¦9¦[^]]\{1,\}¦]]]]]]]]X¦
+-1¦-1¦[^]]\{1,\}¦]]]]]]]]¦
+# GA120(3)
+3¦3¦[c[.].]d]¦ab]cd¦
+2¦8¦[a-z]*[[.].]][A-Z]*¦Abcd]DEFg¦
+# GA121
+2¦2¦[[.a.]b]¦Abc¦
+1¦1¦[[.a.]b]¦aBc¦
+-1¦-1¦[[.a.]b]¦ABc¦
+3¦3¦[^[.a.]b]¦abc¦
+3¦3¦[][.-.]-0]¦ab0-]¦
+3¦3¦[A-[.].]c]¦ab]!¦
+# GA122
+-2¦-2¦[[.ch.]]¦abc¦
+-2¦-2¦[[.ab.][.CD.][.EF.]]¦yZabCDEFQ9¦
+# GA125
+2¦2¦[[=a=]b]¦Abc¦
+1¦1¦[[=a=]b]¦aBc¦
+-1¦-1¦[[=a=]b]¦ABc¦
+3¦3¦[^[=a=]b]¦abc¦
+# GA126
+#W the expected result for [[:alnum:]]* is 2-7 which is wrong
+0¦0¦[[:alnum:]]*¦ aB28gH¦
+2¦7¦[[:alnum:]][[:alnum:]]*¦ aB28gH¦
+#W the expected result for [^[:alnum:]]* is 2-5 which is wrong
+0¦0¦[^[:alnum:]]*¦2    ,a¦
+2¦5¦[^[:alnum:]][^[:alnum:]]*¦2        ,a¦
+#W the expected result for [[:alpha:]]* is 2-5 which is wrong
+0¦0¦[[:alpha:]]*¦ aBgH2¦
+2¦5¦[[:alpha:]][[:alpha:]]*¦ aBgH2¦
+1¦6¦[^[:alpha:]]*¦2    8,a¦
+1¦2¦[[:blank:]]*¦      
¦
+1¦8¦[^[:blank:]]*¦aB28gH, ¦
+1¦2¦[[:cntrl:]]*¦       ¦
+1¦8¦[^[:cntrl:]]*¦aB2 8gh,¦
+#W the expected result for [[:digit:]]* is 2-3 which is wrong
+0¦0¦[[:digit:]]*¦a28¦
+2¦3¦[[:digit:]][[:digit:]]*¦a28¦
+1¦8¦[^[:digit:]]*¦aB   gH,¦
+1¦7¦[[:graph:]]*¦aB28gH, ¦
+1¦3¦[^[:graph:]]*¦     ,¦
+1¦2¦[[:lower:]]*¦agB¦
+1¦8¦[^[:lower:]]*¦B2   8H,a¦
+1¦8¦[[:print:]]*¦aB2 8gH,      ¦
+1¦2¦[^[:print:]]*¦      ¦
+#W the expected result for [[:punct:]]* is 2-2 which is wrong
+0¦0¦[[:punct:]]*¦a,2¦
+2¦3¦[[:punct:]][[:punct:]]*¦a,,2¦
+1¦9¦[^[:punct:]]*¦aB2  8gH¦
+1¦3¦[[:space:]]*¦      
¦
+#W the expected result for [^[:space:]]* is 2-9 which is wrong
+0¦0¦[^[:space:]]*¦ aB28gH,    ¦
+2¦9¦[^[:space:]][^[:space:]]*¦ aB28gH,        ¦
+#W the expected result for [[:upper:]]* is 2-3 which is wrong
+0¦0¦[[:upper:]]*¦aBH2¦
+2¦3¦[[:upper:]][[:upper:]]*¦aBH2¦
+1¦8¦[^[:upper:]]*¦a2   8g,B¦
+#W the expected result for [[:xdigit:]]* is 2-5 which is wrong
+0¦0¦[[:xdigit:]]*¦gaB28h¦
+2¦5¦[[:xdigit:]][[:xdigit:]]*¦gaB28h¦
+#W the expected result for [^[:xdigit:]]* is 2-7 which is wrong
+2¦7¦[^[:xdigit:]][^[:xdigit:]]*¦a      gH,2¦
+# GA127
+-2¦-2¦[b-a]¦abc¦
+1¦1¦[a-c]¦bbccde¦
+2¦2¦[a-b]¦-bc¦
+3¦3¦[a-z0-9]¦AB0¦
+3¦3¦[^a-b]¦abcde¦
+3¦3¦[^a-bd-e]¦dec¦
+1¦1¦[]-a]¦a_b¦
+2¦2¦[+--]¦a,b¦
+2¦2¦[--/]¦a.b¦
+2¦2¦[^---]¦-ab¦
+3¦3¦[][.-.]-0]¦ab0-]¦
+3¦3¦[A-[.].]c]¦ab]!¦
+2¦6¦bc[d-w]xy¦abchxyz¦
+# GA129
+1¦1¦[a-cd-f]¦dbccde¦
+-1¦-1¦[a-ce-f]¦dBCCdE¦
+2¦4¦b[n-zA-M]Y¦absY9Z¦
+2¦4¦b[n-zA-M]Y¦abGY9Z¦
+# GA130
+3¦3¦[-xy]¦ac-¦
+2¦4¦c[-xy]D¦ac-D+¦
+2¦2¦[--/]¦a.b¦
+2¦4¦c[--/]D¦ac.D+b¦
+2¦2¦[^-ac]¦abcde-¦
+1¦3¦a[^-ac]c¦abcde-¦
+3¦3¦[xy-]¦zc-¦
+2¦4¦c[xy-]7¦zc-786¦
+2¦2¦[^ac-]¦abcde-¦
+2¦4¦a[^ac-]c¦5abcde-¦
+2¦2¦[+--]¦a,b¦
+2¦4¦a[+--]B¦Xa,By¦
+2¦2¦[^---]¦-ab¦
+4¦6¦X[^---]Y¦X-YXaYXbY¦
+# 2.8.3.3  BREs Matching Multiple Characters
+# GA131
+3¦4¦cd¦abcdeabcde¦
+1¦2¦ag*b¦abcde¦
+-1¦-1¦[a-c][e-f]¦abcdef¦
+3¦4¦[a-c][e-f]¦acbedf¦
+4¦8¦abc*XYZ¦890abXYZ#*¦
+4¦9¦abc*XYZ¦890abcXYZ#*¦
+4¦15¦abc*XYZ¦890abcccccccXYZ#*¦
+-1¦-1¦abc*XYZ¦890abc*XYZ#*¦
+# GA132
+2¦4¦\(*bc\)¦a*bc¦
+1¦2¦\(ab\)¦abcde¦
+1¦10¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(j\)\)\)\)\)\)\)\)¦abcdefghijk¦
+3¦8¦43\(2\(6\)*0\)AB¦654320ABCD¦
+3¦9¦43\(2\(7\)*0\)AB¦6543270ABCD¦
+3¦12¦43\(2\(7\)*0\)AB¦6543277770ABCD¦
+# GA133
+1¦10¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(j\)\)\)\)\)\)\)\)¦abcdefghijk¦
+-1¦-1¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(k\)\)\)\)\)\)\)\)¦abcdefghijk¦
+# GA134
+2¦4¦\(bb*\)¦abbbc¦
+2¦2¦\(bb*\)¦ababbbc¦
+1¦6¦a\(.*b\)¦ababbbc¦
+1¦2¦a\(b*\)¦ababbbc¦
+1¦20¦a\(.*b\)c¦axcaxbbbcsxbbbbbbbbc¦
+# GA135
+1¦7¦\(a\(b\(c\(d\(e\)\)\)\)\)\4¦abcdededede¦
+#W POSIX does not really specify whether a\(b\)*c\1 matches acb.
+#W back references are supposed to expand to the last match, but what
+#W if there never was a match as in this case?
+-1¦-1¦a\(b\)*c\1¦acb¦
+1¦11¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(j\)\)\)\)\)\)\)\)\9¦abcdefghijjk¦
+# GA136
+#W These two tests have the same problem as the test in GA135.  No match
+#W of a subexpression, why should the back reference be usable?
+#W 1 2 a\(b\)*c\1 acb
+#W 4 7 a\(b\(c\(d\(f\)*\)\)\)\4¦xYzabcdePQRST
+-1¦-1¦a\(b\)*c\1¦acb¦
+-1¦-1¦a\(b\(c\(d\(f\)*\)\)\)\4¦xYzabcdePQRST¦
+# GA137
+-2¦-2¦\(a\(b\)\)\3¦foo¦
+-2¦-2¦\(a\(b\)\)\(a\(b\)\)\5¦foo¦
+# GA138
+1¦2¦ag*b¦abcde¦
+1¦10¦a.*b¦abababvbabc¦
+2¦5¦b*c¦abbbcdeabbbbbbcde¦
+2¦5¦bbb*c¦abbbcdeabbbbbbcde¦
+1¦5¦a\(b\)*c\1¦abbcbbb¦
+-1¦-1¦a\(b\)*c\1¦abbdbd¦
+0¦0¦\([a-c]*\)\1¦abcacdef¦
+1¦6¦\([a-c]*\)\1¦abcabcabcd¦
+1¦2¦a^*b¦ab¦
+1¦5¦a^*b¦a^^^b¦
+# GA139
+1¦2¦a\{2\}¦aaaa¦
+1¦7¦\([a-c]*\)\{0,\}¦aabcaab¦
+1¦2¦\(a\)\1\{1,2\}¦aabc¦
+1¦3¦\(a\)\1\{1,2\}¦aaaabc¦
+#W the expression \(\(a\)\1\)\{1,2\} is ill-formed, using \2
+1¦4¦\(\(a\)\2\)\{1,2\}¦aaaabc¦
+# GA140
+1¦2¦a\{2\}¦aaaa¦
+-1¦-1¦a\{2\}¦abcd¦
+0¦0¦a\{0\}¦aaaa¦
+1¦64¦a\{64\}¦aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¦
+# GA141
+1¦7¦\([a-c]*\)\{0,\}¦aabcaab¦
+#W the expected result for \([a-c]*\)\{2,\} is failure which isn't correct
+1¦3¦\([a-c]*\)\{2,\}¦abcdefg¦
+1¦3¦\([a-c]*\)\{1,\}¦abcdefg¦
+-1¦-1¦a\{64,\}¦aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¦
+# GA142
+1¦3¦a\{2,3\}¦aaaa¦
+-1¦-1¦a\{2,3\}¦abcd¦
+0¦0¦\([a-c]*\)\{0,0\}¦foo¦
+1¦63¦a\{1,63\}¦aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¦
+# 2.8.3.4  BRE Precedence
+# GA143
+#W There are numerous bugs in the original version.
+2¦19¦\^\[[[.].]]\\(\\1\\)\*\\{1,2\\}\$¦a^[]\(\1\)*\{1,2\}$b¦
+1¦6¦[[=*=]][[=\=]][[=]=]][[===]][[...]][[:punct:]]¦*\]=.;¦
+1¦6¦[$\(*\)^]*¦$\()*^¦
+1¦1¦[\1]¦1¦
+1¦1¦[\{1,2\}]¦{¦
+#W the expected result for \(*\)*\1* is 2-2 which isn't correct
+0¦0¦\(*\)*\1*¦a*b*11¦
+2¦3¦\(*\)*\1*b¦a*b*11¦
+#W the expected result for \(a\(b\{1,2\}\)\{1,2\}\) is 1-5 which isn't correct
+1¦3¦\(a\(b\{1,2\}\)\{1,2\}\)¦abbab¦
+1¦5¦\(a\(b\{1,2\}\)\)\{1,2\}¦abbab¦
+1¦1¦^\(^\(^a$\)$\)$¦a¦
+1¦2¦\(a\)\1$¦aa¦
+1¦3¦ab*¦abb¦
+1¦4¦ab\{2,4\}¦abbbc¦
+# 2.8.3.5  BRE Expression Anchoring
+# GA144
+1¦1¦^a¦abc¦
+-1¦-1¦^b¦abc¦
+-1¦-1¦^[a-zA-Z]¦99Nine¦
+1¦4¦^[a-zA-Z]*¦Nine99¦
+# GA145(1)
+1¦2¦\(^a\)\1¦aabc¦
+-1¦-1¦\(^a\)\1¦^a^abc¦
+1¦2¦\(^^a\)¦^a¦
+1¦1¦\(^^\)¦^^¦
+1¦3¦\(^abc\)¦abcdef¦
+-1¦-1¦\(^def\)¦abcdef¦
+### GA145(2)                   GNU regex implements GA145(1)
+##-1¦-1¦\(^a\)\1¦aabc¦
+##1¦4¦\(^a\)\1¦^a^abc¦
+##-1¦-1¦\(^^a\)¦^a¦
+##1¦2¦\(^^\)¦^^¦
+# GA146
+3¦3¦a$¦cba¦
+-1¦-1¦a$¦abc¦
+5¦7¦[a-z]*$¦99ZZxyz¦
+#W the expected result for [a-z]*$ is failure which isn't correct
+10¦9¦[a-z]*$¦99ZZxyz99¦
+3¦3¦$$¦ab$¦
+-1¦-1¦$$¦$ab¦
+3¦3¦\$$¦ab$¦
+# GA147(1)
+-1¦-1¦\(a$\)\1¦bcaa¦
+-1¦-1¦\(a$\)\1¦ba$¦
+-1¦-1¦\(ab$\)¦ab$¦
+1¦2¦\(ab$\)¦ab¦
+4¦6¦\(def$\)¦abcdef¦
+-1¦-1¦\(abc$\)¦abcdef¦
+### GA147(2)                   GNU regex implements GA147(1)
+##-1¦-1¦\(a$\)\1¦bcaa¦
+##2¦5¦\(a$\)\1¦ba$a$¦
+##-1¦-1¦\(ab$\)¦ab¦
+##1¦3¦\(ab$\)¦ab$¦
+# GA148
+0¦0¦^$¦¦
+1¦3¦^abc$¦abc¦
+-1¦-1¦^xyz$¦^xyz^¦
+-1¦-1¦^234$¦^234$¦
+1¦9¦^[a-zA-Z0-9]*$¦2aA3bB9zZ¦
+-1¦-1¦^[a-z0-9]*$¦2aA3b#B9zZ¦
diff --git a/test/src/regex/regex-resources/TESTS 
b/test/src/regex/regex-resources/TESTS
new file mode 100644
index 0000000..f2c9886
--- /dev/null
+++ b/test/src/regex/regex-resources/TESTS
@@ -0,0 +1,167 @@
+0:(.*)*\1:xx
+0:^:
+0:$:
+0:^$:
+0:^a$:a
+0:abc:abc
+1:abc:xbc
+1:abc:axc
+1:abc:abx
+0:abc:xabcy
+0:abc:ababc
+0:ab*c:abc
+0:ab*bc:abc
+0:ab*bc:abbc
+0:ab*bc:abbbbc
+0:ab+bc:abbc
+1:ab+bc:abc
+1:ab+bc:abq
+0:ab+bc:abbbbc
+0:ab?bc:abbc
+0:ab?bc:abc
+1:ab?bc:abbbbc
+0:ab?c:abc
+0:^abc$:abc
+1:^abc$:abcc
+0:^abc:abcc
+1:^abc$:aabc
+0:abc$:aabc
+0:^:abc
+0:$:abc
+0:a.c:abc
+0:a.c:axc
+0:a.*c:axyzc
+1:a.*c:axyzd
+1:a[bc]d:abc
+0:a[bc]d:abd
+1:a[b-d]e:abd
+0:a[b-d]e:ace
+0:a[b-d]:aac
+0:a[-b]:a-
+0:a[b-]:a-
+2:a[b-a]:-
+2:a[]b:-
+2:a[:-
+0:a]:a]
+0:a[]]b:a]b
+0:a[^bc]d:aed
+1:a[^bc]d:abd
+0:a[^-b]c:adc
+1:a[^-b]c:a-c
+1:a[^]b]c:a]c
+0:a[^]b]c:adc
+0:ab|cd:abc
+0:ab|cd:abcd
+0:()ef:def
+0:()*:-
+2:*a:-
+2:^*:-
+2:$*:-
+2:(*)b:-
+1:$b:b
+2:a\:-
+0:a\(b:a(b
+0:a\(*b:ab
+0:a\(*b:a((b
+1:a\x:a\x
+1:abc):-
+2:(abc:-
+0:((a)):abc
+0:(a)b(c):abc
+0:a+b+c:aabbabc
+0:a**:-
+0:a*?:-
+0:(a*)*:-
+0:(a*)+:-
+0:(a|)*:-
+0:(a*|b)*:-
+0:(a+|b)*:ab
+0:(a+|b)+:ab
+0:(a+|b)?:ab
+0:[^ab]*:cde
+0:(^)*:-
+0:(ab|)*:-
+2:)(:-
+1:abc:
+1:abc:
+0:a*:
+0:([abc])*d:abbbcd
+0:([abc])*bcd:abcd
+0:a|b|c|d|e:e
+0:(a|b|c|d|e)f:ef
+0:((a*|b))*:-
+0:abcd*efg:abcdefg
+0:ab*:xabyabbbz
+0:ab*:xayabbbz
+0:(ab|cd)e:abcde
+0:[abhgefdc]ij:hij
+1:^(ab|cd)e:abcde
+0:(abc|)ef:abcdef
+0:(a|b)c*d:abcd
+0:(ab|ab*)bc:abc
+0:a([bc]*)c*:abc
+0:a([bc]*)(c*d):abcd
+0:a([bc]+)(c*d):abcd
+0:a([bc]*)(c+d):abcd
+0:a[bcd]*dcdcde:adcdcde
+1:a[bcd]+dcdcde:adcdcde
+0:(ab|a)b*c:abc
+0:((a)(b)c)(d):abcd
+0:[A-Za-z_][A-Za-z0-9_]*:alpha
+0:^a(bc+|b[eh])g|.h$:abh
+0:(bc+d$|ef*g.|h?i(j|k)):effgz
+0:(bc+d$|ef*g.|h?i(j|k)):ij
+1:(bc+d$|ef*g.|h?i(j|k)):effg
+1:(bc+d$|ef*g.|h?i(j|k)):bcdd
+0:(bc+d$|ef*g.|h?i(j|k)):reffgz
+1:((((((((((a)))))))))):-
+0:(((((((((a))))))))):a
+1:multiple words of text:uh-uh
+0:multiple words:multiple words, yeah
+0:(.*)c(.*):abcde
+1:\((.*),:(.*)\)
+1:[k]:ab
+0:abcd:abcd
+0:a(bc)d:abcd
+0:a[-]?c:ac
+0:(....).*\1:beriberi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Qaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mo'ammar 
Gadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Kaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Qadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moammar El 
Kadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Gadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar 
al-Qadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamer El 
Kazzafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamar 
al-Gaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar 
Al Qathafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Al 
Qathafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mo'ammar 
el-Gadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamar El 
Kadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
al-Qadhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar 
al-Qadhdhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar 
Qadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamar 
Gaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar 
Qadhdhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Khaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
al-Khaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'amar 
al-Kadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Ghaddafy
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Ghadafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Ghaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muamar 
Kaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Quathafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar 
Gheddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muamar 
Al-Kaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moammar 
Khadafy 
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moammar 
Qudhafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar 
al-Qaddafi
+0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mulazim 
Awwal Mu'ammar Muhammad Abu Minyar al-Qadhafi
+0:[[:digit:]]+:01234
+1:[[:alpha:]]+:01234
+0:^[[:digit:]]*$:01234
+1:^[[:digit:]]*$:01234a
+0:^[[:alnum:]]*$:01234a
+0:^[[:xdigit:]]*$:01234a
+1:^[[:xdigit:]]*$:01234g
+0:^[[:alnum:][:space:]]*$:Hello world
-- 
2.1.4

>From c5687df16cd4cb73fc593556a68a124354f273db Mon Sep 17 00:00:00 2001
From: Dima Kogan <address@hidden>
Date: Sat, 27 Feb 2016 18:06:35 -0800
Subject: [PATCH 2/2] Added driver for the regex tests

* test/src/regex/regex-tests.el: regex test driver
---
 test/src/regex/regex-tests.el | 590 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 590 insertions(+)
 create mode 100644 test/src/regex/regex-tests.el

diff --git a/test/src/regex/regex-tests.el b/test/src/regex/regex-tests.el
new file mode 100644
index 0000000..8709f90
--- /dev/null
+++ b/test/src/regex/regex-tests.el
@@ -0,0 +1,590 @@
+;;; regex-tests.el --- tests for regex.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2016 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.
+
+(require 'ert)
+
+(defmacro regex-tests-generic-line (comment-char test-file whitelist &rest 
body)
+  "Reads a line of the test file TEST-FILE, skipping
+comments (defined by COMMENT-CHAR), and evaluates the tests in
+this line as defined in the BODY.  Line numbers in the WHITELIST
+are known failures, and are skipped."
+
+  `(with-temp-buffer
+    (modify-syntax-entry ?_ "w;; ") ; tests expect _ to be a word
+    (insert-file-contents ,(concat (file-name-directory (buffer-file-name)) 
test-file))
+
+    (let ((case-fold-search nil)
+          (line-number 1)
+          (whitelist-idx 0))
+
+      (goto-char (point-min))
+
+      (while (not (eobp))
+        (let ((start (point)))
+          (end-of-line)
+          (narrow-to-region start (point))
+
+          (goto-char (point-min))
+
+          (when
+              (and
+               ;; ignore comments
+               (save-excursion
+                 (re-search-forward ,(concat "^[^" (string comment-char) "]") 
nil t))
+
+               ;; skip lines in the whitelist
+               (let ((whitelist-next
+                      (condition-case nil
+                          (aref ,whitelist whitelist-idx) (args-out-of-range 
nil))))
+                 (cond
+                  ;; whitelist exhausted. do process this line
+                  ((null whitelist-next) t)
+
+                  ;; we're not yet at the next whitelist element. do
+                  ;; process this line
+                  ((< line-number whitelist-next) t)
+
+                  ;; we're past the next whitelist element. This
+                  ;; shouldn't happen
+                  ((> line-number whitelist-next)
+                   (error
+                    (format
+                     "We somehow skipped the next whitelist element: line %d" 
whitelist-next)))
+
+                  ;; we're at the next whitelist element. Skip this
+                  ;; line, and advance the whitelist index
+                  (t
+                   (setq whitelist-idx (1+ whitelist-idx)) nil))))
+            ,@body)
+
+          (widen)
+          (forward-line)
+          (beginning-of-line)
+          (setq line-number (1+ line-number)))))))
+
+(defun regex-tests-compare (string what-failed bounds-ref &optional 
substring-ref)
+  "I just ran a search, looking at STRING.  WHAT-FAILED describes
+what failed, if anything; valid values are 'search-failed,
+'compilation-failed and nil.  I compare the beginning/end of each
+group with their expected values.  This is done with either
+BOUNDS-REF or SUBSTRING-REF; one of those should be non-nil.
+BOUNDS-REF is a sequence \[start-ref0 end-ref0 start-ref1
+end-ref1 ....] while SUBSTRING-REF is the expected substring
+obtained by indexing the input string by start/end-ref.
+
+If the search was supposed to fail then start-ref0/substring-ref0
+is 'search-failed.  If the search wasn't even supposed to compile
+successfully, then start-ref0/substring-ref0 is
+'compilation-failed.  If I only care about a match succeeding,
+this can be set to t.
+
+This function returns a string that describes the failure, or nil
+on success"
+
+  (when (or
+         (and bounds-ref substring-ref)
+         (not (or bounds-ref substring-ref)))
+    (error "Exactly one of bounds-ref and bounds-ref should be non-nil"))
+
+  (let ((what-failed-ref (car (or bounds-ref substring-ref))))
+
+    (cond
+     ((eq what-failed 'search-failed)
+      (cond
+       ((eq what-failed-ref 'search-failed)
+        nil)
+       ((eq what-failed-ref 'compilation-failed)
+        "Expected pattern failure; but no match")
+       (t
+        "Expected match; but no match")))
+
+     ((eq what-failed 'compilation-failed)
+      (cond
+       ((eq what-failed-ref 'search-failed)
+        "Expected no match; but pattern failure")
+       ((eq what-failed-ref 'compilation-failed)
+        nil)
+       (t
+        "Expected match; but pattern failure")))
+
+     ;; The regex match succeeded
+     ((eq what-failed-ref 'search-failed)
+      "Expected no match; but match")
+     ((eq what-failed-ref 'compilation-failed)
+      "Expected pattern failure; but match")
+
+     ;; The regex match succeeded, as expected. I now check all the
+     ;; bounds
+     (t
+      (let ((idx 0)
+            msg
+            ref next-ref-function compare-ref-function mismatched-ref-function)
+
+        (if bounds-ref
+            (setq ref bounds-ref
+                  next-ref-function (lambda (x) (cddr x))
+                  compare-ref-function (lambda (ref start-pos end-pos)
+                                         (or (eq (car ref) t)
+                                             (and (eq start-pos (car ref))
+                                                  (eq end-pos   (cadr ref)))))
+                  mismatched-ref-function (lambda (ref start-pos end-pos)
+                                            (format
+                                             "beginning/end positions: %d/%s 
and %d/%s"
+                                             start-pos (car ref) end-pos (cadr 
ref))))
+          (setq ref substring-ref
+                next-ref-function (lambda (x) (cdr x))
+                compare-ref-function (lambda (ref start-pos end-pos)
+                                       (or (eq (car ref) t)
+                                           (string= (substring string 
start-pos end-pos) (car ref))))
+                mismatched-ref-function (lambda (ref start-pos end-pos)
+                                          (format
+                                           "beginning/end positions: %d/%s and 
%d/%s"
+                                           start-pos (car ref) end-pos (cadr 
ref)))))
+
+        (while (not (or (null ref) msg))
+
+          (let ((start (match-beginning idx))
+                (end   (match-end       idx)))
+
+            (when (not (funcall compare-ref-function ref start end))
+              (setq msg
+                    (format
+                     "Have expected match, but mismatch in group %d: %s" idx 
(funcall mismatched-ref-function ref start end))))
+
+            (setq ref (funcall next-ref-function ref)
+                  idx (1+ idx))))
+
+        (or msg
+            nil))))))
+
+
+
+(defun regex-tests-match (pattern string bounds-ref &optional substring-ref)
+  "I match the given STRING against PATTERN.  I compare the
+beginning/end of each group with their expected values.
+BOUNDS-REF is a sequence [start-ref0 end-ref0 start-ref1 end-ref1
+....].
+
+If the search was supposed to fail then start-ref0 is
+'search-failed.  If the search wasn't even supposed to compile
+successfully, then start-ref0 is 'compilation-failed.
+
+This function returns a string that describes the failure, or nil
+on success"
+
+  (if (string-match "\\[\\([\\.=]\\)..?\\1\\]" pattern)
+      ;; Skipping test: [.x.] and [=x=] forms not supported by emacs
+      nil
+
+    (regex-tests-compare
+     string
+     (condition-case nil
+         (if (string-match pattern string) nil 'search-failed)
+       ('invalid-regexp 'compilation-failed))
+     bounds-ref substring-ref)))
+
+
+(defconst regex-tests-re-even-escapes
+  "\\(?:^\\|[^\\\\]\\)\\(?:\\\\\\\\\\)*"
+  "Regex that matches an even number of \\ characters")
+
+(defconst regex-tests-re-odd-escapes
+  (concat regex-tests-re-even-escapes "\\\\")
+  "Regex that matches an odd number of \\ characters")
+
+
+(defun regex-tests-unextend (pattern)
+  "Basic conversion from extended regexen to emacs ones.  This is
+mostly a hack that adds \\ to () and | and {}, and removes it if
+it already exists.  We also change \\S (and \\s) to \\S- (and
+\\s-) because extended regexen see the former as whitespace, but
+emacs requires an extra symbol character"
+
+  (with-temp-buffer
+    (insert pattern)
+    (goto-char (point-min))
+
+    (while (re-search-forward "[()|{}]" nil t)
+      ;; point is past special character. If it is escaped, unescape
+      ;; it
+
+      (if (save-excursion
+            (re-search-backward (concat regex-tests-re-odd-escapes ".\\=") nil 
t))
+
+          ;; This special character is preceded by an odd number of \,
+          ;; so I unescape it by removing the last one
+          (progn
+            (forward-char -2)
+            (delete-char 1)
+            (forward-char 1))
+
+        ;; This special character is preceded by an even (possibly 0)
+        ;; number of \. I add an escape
+        (forward-char -1)
+        (insert "\\")
+        (forward-char 1)))
+
+    ;; convert \s to \s-
+    (goto-char (point-min))
+    (while (re-search-forward (concat regex-tests-re-odd-escapes "[Ss]") nil t)
+      (insert "-"))
+
+    (buffer-string)))
+
+(defun regex-tests-BOOST-frob-escapes (s ispattern)
+  "Mangle \\ the way it is done in frob_escapes() in
+regex-tests-BOOST.c in glibc: \\t, \\n, \\r are interpreted;
+\\\\, \\^, \{, \\|, \} are unescaped for the string (not
+pattern)"
+
+  ;; this is all similar to (regex-tests-unextend)
+  (with-temp-buffer
+    (insert s)
+
+    (let ((interpret-list (list "t" "n" "r")))
+      (while interpret-list
+        (goto-char (point-min))
+        (while (re-search-forward
+                (concat "\\(" regex-tests-re-even-escapes "\\)"
+                        "\\\\" (car interpret-list))
+                nil t)
+          (replace-match (concat "\\1" (car (read-from-string
+                                             (concat "\"\\" (car 
interpret-list) "\""))))))
+
+        (setq interpret-list (cdr interpret-list))))
+
+    (when (not ispattern)
+      ;; unescape \\, \^, \{, \|, \}
+      (let ((unescape-list (list "\\\\" "^" "{" "|" "}")))
+        (while unescape-list
+          (goto-char (point-min))
+          (while (re-search-forward
+                  (concat "\\(" regex-tests-re-even-escapes "\\)"
+                          "\\\\" (car unescape-list))
+                  nil t)
+            (replace-match (concat "\\1" (car unescape-list))))
+
+          (setq unescape-list (cdr unescape-list))))
+      )
+    (buffer-string)))
+
+
+
+
+(defconst regex-tests-BOOST-whitelist
+  [
+   ;; emacs is more stringent with regexen involving unbalanced )
+   63 65 69
+
+   ;; in emacs, regex . doesn't match \n
+   91
+
+   ;; emacs is more forgiving with * and ? that don't apply to
+   ;; characters
+   107 108 109 122 123 124 140 141 142
+
+   ;; emacs accepts regexen with {}
+   161
+
+   ;; emacs doesn't fail on bogus ranges such as [3-1] or [1-3-5]
+   222 223
+
+   ;; emacs doesn't match (ab*)[ab]*\1 greedily: only 4 chars of
+   ;; ababaaa match
+   284 294
+
+   ;; ambiguous groupings are ambiguous
+   443 444 445 446 448 449 450
+
+   ;; emacs doesn't know how to handle weird ranges such as [a-Z] and
+   ;; [[:alpha:]-a]
+   539 580 581
+
+   ;; emacs matches non-greedy regex ab.*? non-greedily
+   639 677 712
+   ]
+  "Line numbers in the boost test that should be skipped.  These
+are false-positive test failures that represent known/benign
+differences in behavior.")
+
+;; - Format
+;;   - Comments are lines starting with ;
+;;   - Lines starting with - set options passed to regcomp() and regexec():
+;;     - if no "REG_BASIC" is found, with have an extended regex
+;;     - These set a flag:
+;;       - REG_ICASE
+;;       - REG_NEWLINE
+;;       - REG_NOTBOL
+;;       - REG_NOTEOL
+;;
+;;   - Test lines are
+;;     pattern string start0 end0 start1 end1 ...
+;;
+;;   - pattern, string can have escapes
+;;   - string can have whitespace if enclosed in ""
+;;   - if string is "!", then the pattern is supposed to fail compilation
+;;   - start/end are of group0, group1, etc. group 0 is the full match
+;;   - start<0 indicates "no match"
+;;   - start is the 0-based index of the first character
+;;   - end   is the 0-based index of the first character past the group
+(defun regex-tests-BOOST ()
+  (let (failures
+        basic icase newline notbol noteol)
+    (regex-tests-generic-line
+     ?; "regex-resources/BOOST.tests" regex-tests-BOOST-whitelist
+     (if (save-excursion (re-search-forward "^-" nil t))
+         (setq basic   (save-excursion (re-search-forward "REG_BASIC" nil t))
+               icase   (save-excursion (re-search-forward "REG_ICASE" nil t))
+               newline (save-excursion (re-search-forward "REG_NEWLINE" nil t))
+               notbol  (save-excursion (re-search-forward "REG_NOTBOL" nil t))
+               noteol  (save-excursion (re-search-forward "REG_NOTEOL" nil t)))
+
+       (save-excursion
+         (or (re-search-forward "\\(\\S-+\\)\\s-+\"\\(.*\\)\"\\s-+?\\(.+\\)" 
nil t)
+             (re-search-forward "\\(\\S-+\\)\\s-+\\(\\S-+\\)\\s-+?\\(.+\\)"  
nil t)
+             (re-search-forward "\\(\\S-+\\)\\s-+\\(!\\)"                    
nil t)))
+
+       (let* ((pattern-raw   (match-string 1))
+              (string-raw    (match-string 2))
+              (positions-raw (match-string 3))
+              (pattern (regex-tests-BOOST-frob-escapes pattern-raw t))
+              (string  (regex-tests-BOOST-frob-escapes string-raw  nil))
+              (positions
+               (if (string= string "!")
+                   (list 'compilation-failed 0)
+                 (mapcar
+                  (lambda (x)
+                    (let ((x (string-to-number x)))
+                      (if (< x 0) nil x)))
+                  (split-string positions-raw)))))
+
+         (when (null (car positions))
+           (setcar positions 'search-failed))
+
+         (when (not basic)
+           (setq pattern (regex-tests-unextend pattern)))
+
+         ;; great. I now have all the data parsed. Let's use it to do
+         ;; stuff
+         (let* ((case-fold-search icase)
+                (msg (regex-tests-match pattern string positions)))
+
+           (if (and
+                ;; Skipping test: notbol/noteol not supported
+                (not notbol) (not noteol)
+
+                msg)
+
+               ;; store failure
+               (setq failures
+                     (cons (format "line number %d: Regex '%s': %s"
+                                   line-number pattern msg)
+                           failures)))))))
+
+    failures))
+
+(defconst regex-tests-PCRE-whitelist
+  [
+   ;; ambiguous groupings are ambiguous
+   610 611 1154 1157 1160 1168 1171 1176 1179 1182 1185 1188 1193 1196 1203
+  ]
+  "Line numbers in the PCRE test that should be skipped.  These
+are false-positive test failures that represent known/benign
+differences in behavior.")
+
+;; - Format
+;;
+;;  regex
+;;  input_string
+;;  group_num: group_match | "No match"
+;;  input_string
+;;  group_num: group_match | "No match"
+;;  input_string
+;;  group_num: group_match | "No match"
+;;  input_string
+;;  group_num: group_match | "No match"
+;;  ...
+(defun regex-tests-PCRE ()
+  (let (failures
+        pattern icase string what-failed matches-observed)
+    (regex-tests-generic-line
+     ?# "regex-resources/PCRE.tests" regex-tests-PCRE-whitelist
+
+     (cond
+
+      ;; pattern
+      ((save-excursion (re-search-forward "^/\\(.*\\)/\\(.*i?\\)$" nil t))
+       (setq icase (string= "i" (match-string 2))
+             pattern (regex-tests-unextend (match-string 1))))
+
+      ;; string. read it in, match against pattern, and save all the results
+      ((save-excursion (re-search-forward "^    \\(.*\\)" nil t))
+       (let ((case-fold-search icase))
+         (setq string (match-string 1)
+
+               ;; the regex match under test
+               what-failed
+               (condition-case nil
+                   (if (string-match pattern string) nil 'search-failed)
+                 ('invalid-regexp 'compilation-failed))
+
+               matches-observed
+               (loop for x from 0 to 20
+                     collect (and (not what-failed)
+                                  (or (match-string x string) "<unset>")))))
+       nil)
+
+      ;; verification line: failed match
+      ((save-excursion (re-search-forward "^No match" nil t))
+       (unless what-failed
+         (setq failures
+               (cons (format "line number %d: Regex '%s': Expected no match; 
but match"
+                             line-number pattern)
+                     failures))))
+
+      ;; verification line: succeeded match
+      ((save-excursion (re-search-forward "^ *\\([0-9]+\\): \\(.*\\)" nil t))
+       (let* ((match-ref (match-string 2))
+              (idx       (string-to-number (match-string 1))))
+
+         (if what-failed
+             "Expected match; but no match"
+           (unless (string= match-ref (elt matches-observed idx))
+             (setq failures
+                   (cons (format "line number %d: Regex '%s': Have expected 
match, but group %d is wrong: '%s'/'%s'"
+                                 line-number pattern
+                                 idx match-ref (elt matches-observed idx))
+                         failures))))))
+
+      ;; reset
+      (t (setq pattern nil) nil)))
+
+    failures))
+
+(defconst regex-tests-PTESTS-whitelist
+  [
+   ;; emacs doesn't barf on weird ranges such as [b-a], but simply
+   ;; fails to match
+   138
+
+   ;; emacs doesn't see DEL (0x78) as a [:cntrl:] character
+   168
+  ]
+  "Line numbers in the PTESTS test that should be skipped.  These
+are false-positive test failures that represent known/benign
+differences in behavior.")
+
+;; - Format
+;;   - fields separated by ¦ (note: this is not a |)
+;;   - start¦end¦pattern¦string
+;;   - start is the 1-based index of the first character
+;;   - end   is the 1-based index of the last  character
+(defun regex-tests-PTESTS ()
+  (let (failures)
+    (regex-tests-generic-line
+     ?# "regex-resources/PTESTS" regex-tests-PTESTS-whitelist
+     (let* ((fields (split-string (buffer-string) "¦"))
+
+            ;; string has 1-based index of first char in the
+            ;; match. -1 means "no match". -2 means "invalid
+            ;; regex".
+            ;;
+            ;; start-ref is 0-based index of first char in the
+            ;; match
+            ;;
+            ;; string==0 is a special case, and I have to treat
+            ;; it as start-ref = 0
+            (start-ref (let ((raw (string-to-number (elt fields 0))))
+                         (cond
+                          ((= raw -2) 'compilation-failed)
+                          ((= raw -1) 'search-failed)
+                          ((= raw  0) 0)
+                          (t          (1- raw)))))
+
+            ;; string has 1-based index of last char in the
+            ;; match. end-ref is 0-based index of first char past
+            ;; the match
+            (end-ref   (string-to-number (elt fields 1)))
+            (pattern   (elt fields 2))
+            (string    (elt fields 3)))
+
+       (let ((msg (regex-tests-match pattern string (list start-ref end-ref))))
+         (when msg
+           (setq failures
+                 (cons (format "line number %d: Regex '%s': %s"
+                               line-number pattern msg)
+                       failures))))))
+    failures))
+
+(defconst regex-tests-TESTS-whitelist
+  [
+   ;; emacs doesn't barf on weird ranges such as [b-a], but simply
+   ;; fails to match
+   42
+
+   ;; emacs is more forgiving with * and ? that don't apply to
+   ;; characters
+   57 58 59 60
+
+   ;; emacs is more stringent with regexen involving unbalanced )
+   67
+  ]
+  "Line numbers in the TESTS test that should be skipped.  These
+are false-positive test failures that represent known/benign
+differences in behavior.")
+
+;; - Format
+;;   - fields separated by :. Watch for [\[:xxx:]]
+;;   - expected:pattern:string
+;;
+;;   expected:
+;;   | 0 | successful match      |
+;;   | 1 | failed match          |
+;;   | 2 | regcomp() should fail |
+(defun regex-tests-TESTS ()
+  (let (failures)
+    (regex-tests-generic-line
+     ?# "regex-resources/TESTS" regex-tests-TESTS-whitelist
+     (if (save-excursion (re-search-forward 
"^\\([^:]+\\):\\(.*\\):\\([^:]*\\)$" nil t))
+         (let* ((what-failed
+                 (let ((raw (string-to-number (match-string 1))))
+                   (cond
+                    ((= raw 2) 'compilation-failed)
+                    ((= raw 1) 'search-failed)
+                    (t         t))))
+                (string  (match-string 3))
+                (pattern (regex-tests-unextend (match-string 2))))
+
+           (let ((msg (regex-tests-match pattern string nil (list 
what-failed))))
+             (when msg
+               (setq failures
+                     (cons (format "line number %d: Regex '%s': %s"
+                                   line-number pattern msg)
+                           failures)))))
+
+       (error "Error parsing TESTS file line: '%s'" (buffer-string))))
+    failures))
+
+(ert-deftest regex-tests ()
+  "Tests of the email regular expression engine.  This evaluates
+the BOOST, PCRE, PTESTS and TESTS test cases from glibc."
+  (should-not (regex-tests-BOOST))
+  (should-not (regex-tests-PCRE))
+  (should-not (regex-tests-PTESTS))
+  (should-not (regex-tests-TESTS)))
-- 
2.1.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]