bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sed POSIX compatibility regarding '|' in regular expressions


From: Bruno Haible
Subject: sed POSIX compatibility regarding '|' in regular expressions
Date: Thu, 14 Dec 2006 15:10:28 +0100
User-agent: KMail/1.9.1

Hi,

POSIX states in [1], section "Regular Expressions in sed", that sed uses
basic regular expressions, with three minor modifications. The syntax
of basic regular expressions is defined in [2]. According to the text
and to the "RE and Bracket Expression Grammar" section at the end of this
page, POSIX BREs don't support alternation. "The interpretation of an ordinary
character preceded by a backslash ( '\' ) is undefined" - so this means
that the use of '\|' in BREs is a GNU extension.

Bug #1: The --posix option fails to turn off this GNU extension.

$ sed --version
GNU sed Version 4.1.5
...
$ echo 'aaa//bcd' | sed -e 's,\(a\|X\)*//,,'
bcd                              # ok, the GNU extension
$ echo 'aaa//bcd' | sed --posix -e 's,\(a\|X\)*//,,'
bcd                              # wrong, should be aaabcd or signal an error

Bug #2: A doc bug. The section "Extended regular expressions" does not
mention that alternations are a difference between basic and extended
regular expressions: In EREs they are written as '|', in BREs they are
written as '\|' (GNU extension) or unavailable (pure POSIX).

Here's a suggested doc change.

--- sed.texi.bak        2006-01-30 08:27:29.000000000 +0100
+++ sed.texi    2006-12-14 00:06:19.000000000 +0100
@@ -2913,7 +2913,7 @@
 @cindex Extended regular expressions, syntax
 
 The only difference between basic and extended regular expressions is in
-the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
+the behavior of a few characters: @samp{?}, @samp{+}, @samp{|}, parentheses,
 and braces (@address@hidden@}}).  While basic regular expressions require
 these to be escaped if you want them to behave as special characters,
 when using extended regular expressions you must escape them if
@@ -2926,9 +2926,22 @@
 becomes @samp{abc\?} when using extended regular expressions.  It matches
 the literal string @samp{abc?}.
 
address@hidden abc\?
+becomes @samp{abc?} when using extended regular expressions.  It matches
+either @samp{ab} or @samp{abc}.  This construct is a GNU extension for
+basic regular expressions, but standard POSIX for extended regular
+expressions.
+
 @item c\+
 becomes @samp{c+} when using extended regular expressions.  It matches
-one or more @samp{c}s.
+one or more @samp{c}s.  This construct is a GNU extension for basic regular
+expressions, but standard POSIX for extended regular expressions.
+
address@hidden abc\|def
+becomes @samp{abc|def} when using extended regular expressions.  It matches
+either @samp{abc} or @samp{def}.  This construct, called ``alternation'',
+is a GNU extension for basic regular expressions, but standard POSIX for
+extended regular expressions.
 
 @item address@hidden,address@hidden
 becomes @address@hidden,@}} when using extended regular expressions.  It 
matches



[1] http://www.opengroup.org/susv3/utilities/sed.html
[2] http://www.opengroup.org/susv3/basedefs/xbd_chap09.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]