help-smalltalk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Help-smalltalk] [bug] regex doesn't support i (ignorecase) flag


From: Stephen Compall
Subject: [Help-smalltalk] [bug] regex doesn't support i (ignorecase) flag
Date: Mon, 01 Oct 2007 18:05:39 -0700

Issue status update for http://smalltalk.gnu.org/node/85 Post a follow up: http://smalltalk.gnu.org/project/comments/add/85

Project:      GNU Smalltalk
Version:      <none>
Component:    VM
Category:     bug reports
Priority:     normal
Assigned to:  Unassigned
Reported by:  S11001001
Updated by:   S11001001
Status:       patch
Attachment:   http://smalltalk.gnu.org/files/issues/latin1-re-ignorecase.patch 
(2.58 KB)

Example:


st> ('a' =~ '(?i:A)') inspect!
An instance of Kernel.FailedMatchRegexResults

<!--break-->

I found that this is because pre_set_casetable in lib-src/regex.c is
never called.  This is fixed in
address@hidden/smalltalk--backstage--2.2--patch-62*,
"support (?i:...) in regexps".


st> ('a' =~ '(?i:A)') inspect!
An instance of Kernel.MatchingRegexResults


There are multiple solution paths, because case folding is
charset-dependent.  The patch implements #3:


   *  Always import I18N and use the locale database to determine the
charset of Strings.  I'm not sure what the exact semantics of this
would be.
   *  Assume ASCII.  regex.c already effectively assumes that strings
are somewhat ASCII-compatible, and this wouldn't bias in favor of a
particular ASCII superset.
   *  Assume Latin-1.  This has the benefit of offering a clear
behavior path to future support for matching full Unicode strings, so
it's what the patch uses.
   *  Assume Latin-9.  Technically this supersedes Latin-1, so is more
up-to-date, but is not a codepoint-wise subset of Unicode.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]