bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sed error message reports byte position instead of char position whe


From: John Cowan
Subject: Re: sed error message reports byte position instead of char position when program contains UTF-8
Date: Thu, 16 May 2013 11:41:00 -0400
User-agent: Mutt/1.5.20 (2009-06-14)

Eli Zaretskii scripsit:

> AFAIK, Sed uses bytes, not characters.

Definitely not.  Look at the following:

$ echo $LANG
en_US.UTF-8
$ cat >foo
föö
(Ctrl-D)
$ wc -c foo
6 foo
(including the newline; therefore the file is UTF-8)
$ sed -n '/^...$/p' <foo
föö
$ sed -n '/^.....$/p' <foo
$

So the regex matches 3 characters, not 5 bytes.

-- 
We call nothing profound                        address@hidden
that is not wittily expressed.                  John Cowan
        --Northrop Frye (improved)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]