[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] regex: Pass the system regex if its only problem is 32-bit r
From: |
Eric Blake |
Subject: |
Re: [PATCH] regex: Pass the system regex if its only problem is 32-bit regoff_t |
Date: |
Thu, 09 Sep 2010 09:04:36 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100806 Fedora/3.1.2-1.fc13 Mnenhy/0.8.3 Thunderbird/3.1.2 |
On 09/09/2010 02:18 AM, Paolo Bonzini wrote:
The included regex cannot support equivalence classes and multibyte
collation symbols properly. On the other hand it supports 64-bit
regoff_t, which glibc cannot provide without breaking the ABI.
We currently favor the latter, but this is no longer correct since
there's clearly no hope of ever passing the test.
Hmm - here's the current POSIX 2008 wording:
http://www.opengroup.org/onlinepubs/9699919799/basedefs/regex.h.html#tag_13_38
The <regex.h> header shall define the regoff_t type as a signed integer
type that can hold the largest value that can be stored in either a
ptrdiff_t type or a ssize_t type.
That text is changed from POSIX 2001, where it was off_t instead of
ptrdiff_t (breaking even more 32-bit systems) based on ERN 60 as
submitted by Paul:
http://www.opengroup.org/austin/aardvark/latest/xbdbug2.txt
====%<====
OBJECTION Enhancement Request
Number 60
eggert:cs.ucla.edu Defect in XBD regoff_t
(rdvk# 1)
{20050825a} Thu, 25 Aug 2005 23:52:10
+0100 (BST)
_____________________________________________________________________________
Accept_X___ Accept as marked below_____ Duplicate_____
Reject_____
Rationale for rejected or partial changes:
Add to SD5
_____________________________________________________________________________
Page: 296 Line: 10529 Section: regoff_t
Problem:
Edition of Specification (Year): 2004
Defect code : 1. Error
POSIX currently requires regoff_t to be at least as wide as off_t, to
facilitate "future extensions" in which strings are taken from files
rather than from memory. These "future extensions" were anticipated
in 1992, but they have not seen widespread use and are not
standardized.
The off_t<=regoff_t requirement might cause a programmer or
implementer to naively assume that regoff_t must be at least as wide
as off_t. In practice, though, this isn't true on many platforms.
For example, on Solaris 10 (32-bit SPARC, in large-file mode),
regoff_t is a signed 32-bit integer and off_t is a signed 64-bit
integer.
Now, 32-bit Solaris 10 regex.h still conforms to POSIX, so long as you
don't compile in large-file mode. But a wide variety of programs
use large-file mode and it seems inappropriate for large-file mode to
fail to conform to POSIX.
Since the "future extensions" have never materialized, I propose that
the off_t<=regoff_t requirement be dropped from POSIX.
However, it does make sense to require that regoff_t be at least as
wide as ptrdiff_t, so ptrdiff_t can be substituted for off_t.
Action:
Change XBD page 296 lines 10529-10530 from:
The type regoff_t shall be defined as a signed integer type that can
hold the largest value that can be stored in either a type off_t or
type ssize_t.
to:
The type regoff_t shall be defined as a signed integer type that can
hold the largest value that can be stored in either a type ptrdiff_t
or a type ssize_t.
Change XSI page 1222 lines 38367-38375 from:
The substrings reported in pmatch[] are defined using offsets from
the start of the string rather than pointers. Since this is a new
interface, there should be no impact on historical implementations
or applications, and offsets should be just as easy to use as
pointers. The change to offsets was made to facilitate future
extensions in which the string to be searched is presented to
regexec() in blocks, allowing a string to be searched that is not
all in memory at once.
The type regoff_t is used for the elements of pmatch[] to ensure
that the application can represent either the largest possible array
in memory (important for an application conforming to the Shell and
Utilities volume of IEEE Std 1003.1-2001) or the largest possible
file (important for an application using the extension where a file
is searched in chunks).
to:
The substrings reported in pmatch[] are defined using offsets from
the start of the string rather than pointers. This allows type-safe
access to both constant and non-constant strings.
The type regoff_t is used for the elements of pmatch[] to ensure
that the application can represent large arrays in memory (important
for an application conforming to the Shell and Utilities volume of
IEEE Std 1003.1-2001).
The 1992 edition of this standard required regoff_t to be at least
as wide as off_t, to facilitate future extensions in which the
string to be searched is taken from a file. However, these future
extensions have not appeared. The requirement rules out popular
implementations with 32-bit regoff_t and 64-bit off_t, so it has
been withdrawn.
====%<====
But you are right that on x86_64 glibc, we have:
(gdb) p sizeof(regoff_t)
$1 = 4
(gdb) p sizeof(off_t)
$2 = 8
(gdb) p sizeof(ssize_t)
$3 = 8
(gdb) p sizeof(ptrdiff_t)
$4 = 8
Should we go back to the Austin Group to further relax the requirements
on regoff_t to only be at least as large as int, with EOVERFLOW errors
mandated as appropriate?
+2010-09-10 Paolo Bonzini<address@hidden>
+
+ regex: Pass the system regex if its only problem is 32-bit regoff_t.
+ * m4/regex.m4: Disable test for regoff_t size.
While we wait on the Austin Group, this patch seems perfectly acceptable
to me.
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org