bug#22237: sed no longer removes high-ascii characters as it did formerl

bug-sed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22237: sed no longer removes high-ascii characters as it did formerl

From:	Jim Meyering
Subject:	bug#22237: sed no longer removes high-ascii characters as it did formerly.
Date:	Sat, 26 Dec 2015 13:19:07 -0800

On Fri, Dec 25, 2015 at 4:21 AM, Brian Tew <address@hidden> wrote:
> Well, sometimes it do and sometimes it don't.
>
> Script started on Fri 25 Dec 2015 05:53:04 AM CS
> ~$ed sample
> 50
> l
> subject now that thanksgiving has come and gone\342\246$
> q
> ~$
> ~$sed -i 's/[^a-z 0-9]//g' sample

To remove all but the matched bytes, you probably want something like
this instead:

  LC_ALL=C sed -i 's/[^[:alnum:] ]//'

Note I've done two things: used LC_ALL=C to override your default
locale (probably a UTF8 one), and to use [:alnum:] in place of that
nonportable a-z range and 0-9.

In general, with UTF8-based locales, a byte sequence like your
\342\246 will match no regular expression, since it is not a valid
UTF8 character.

What probably changed is that older versions of sed did not properly
handle multi-byte locales, or your other experience was using a
single-byte locale.

If you still think there is a problem with sed-4.22, please provide
more detail and I'll reopen this issue.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#22237: sed no longer removes high-ascii characters as it did formerly., Brian Tew, 2015/12/25
- bug#22237: sed no longer removes high-ascii characters as it did formerly., Jim Meyering <=

Prev by Date: bug#22237: sed no longer removes high-ascii characters as it did formerly.
Next by Date: bug#22254: [PATCH] sed: do not elide an invalid byte in a substitution RHS
Previous by thread: bug#22237: sed no longer removes high-ascii characters as it did formerly.
Next by thread: bug#22254: [PATCH] sed: do not elide an invalid byte in a substitution RHS
Index(es):
- Date
- Thread