bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sed error message reports byte position instead of char position whe


From: Eli Zaretskii
Subject: Re: sed error message reports byte position instead of char position when program contains UTF-8
Date: Thu, 16 May 2013 15:54:58 +0300

> Date: Thu, 16 May 2013 12:42:20 +0100 (BST)
> From: Camion SPAM <address@hidden>
> Cc: "address@hidden" <address@hidden>,
>   "address@hidden" <address@hidden>
> 
> > "Byte" is the only viable alternative, but that leaves the burden of
> > counting bytes on the user.
> 
> If you use a very long sed script, this can be a problem. 
> I happen to write sed scripts that are more than 1000 characters long.
> My last one is currently 1762 characters long and still growing.
> That's why I wrote a little bash function which would show me 
> the n'th character. but this gave wrong positions on scripts with
> UTF-8 chars. 
> 
> The work-around is to change LC_CTYPE to C around the string 
> processing part in my bash function, but, I believe that since sed 
> supports multibytes characters, the error message should count 
> characters and not bytes. btw : the error message states that the 
> position is "char" and not "byte" : 
> 
> sed: -e expression #1, char 12: unknown option to `s'

How do you expect Sed to know what character set is being used for the
command line?  Are we again going to limit ourselves to the current
locale's charset?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]