coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Coreutils-gotchas (was:Re: bug#22045: expr substr ...)


From: Assaf Gordon
Subject: Coreutils-gotchas (was:Re: bug#22045: expr substr ...)
Date: Sun, 29 Nov 2015 01:34:10 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 11/29/2015 12:16 AM, Pádraig Brady wrote:

I must collate some gotchas like this.

Initial list started at:
http://www.pixelbeat.org/docs/coreutils-gotchas.html


Fantastic list!

I would suggest adding four 'wc' entries:

 1. "wc -l" on a file with text but no new-line character will return zero.

     $ printf "hello world" | wc -l
     0

 2. "wc -l" on a file in which the last line doesn't end with NL
    will return a value of one-less than naively expected:
$ printf "hello\nworld" | wc -l
     1

 3. "wc -L" counts "screen display width" (while expanding tabs),
    not characters.

     $ printf "ab\txyz\n" | wc -L
     11
     $ printf "abc\txyz\n" | wc -L
     11
     $ printf "abcd\txyz\n" | wc -L
     11

 4. "wc -L" counts only valid, printable characters, including unicode.

     # valid UTF-8 sequence counted as one character:
     $ printf "\xe2\x99\xa5" | wc -L
     1

     # invalid UTF-8 sequence not counted:
     $ printf "\xe2\xf2\xa5" | wc -l
     0

     # unprintable characters (in C locale) are not counted:
     $ printf "\xe2\x99\xa5" | LC_ALL=C wc -L
     0

     # To count bytes, use sed:
     $ printf "\xe2\x99\xa5" | LC_ALL=C sed 's/././g' | wc -L
     3
These are based on your answer from a while back:
    http://lists.gnu.org/archive/html/coreutils/2015-05/msg00013.html

Thanks!
 - Assaf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]