bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)


From: Assaf Gordon
Subject: bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)
Date: Tue, 20 Aug 2019 00:41:15 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

tag 37093 notabug
close 37093
stop

Hello,

On 2019-08-19 10:44 p.m., Edward Huff wrote:
In the demo below, dd uses 0.665s to write 1GiB of zeros.
sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros.
wc uses 32.160s to count 1GiB of zeros.

[...]

baseline results:
$ dd if=/dev/zero count=$((1024*1024)) bs=1024 | tee >(sha256sum>&2) | wc
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 32.5007 s, 33.0 MB/s
49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -
       0       0 1073741824
$

First,
Try to avoid UTF8 locales (i.e., force a C/POSIX locale with LC_ALL=C)
which makes 'wc' much faster.

On my computer:

With UTF8 locale:

  $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \
        | tee >(sha256sum>&2) | time --portability wc
  1048576+0 records in
  1048576+0 records out
  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 46.5928 s, 23.0 MB/s
  49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -
        0       0 1073741824
  real 46.59
  user 46.37
  sys 0.19

With C locale:

  $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \
       | tee >(sha256sum>&2) | LC_ALL=C time --portability wc
  1048576+0 records in
  1048576+0 records out
  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 8.60285 s, 125 MB/s
  49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -
        0       0 1073741824
  real 8.60
  user 5.22
  sys 0.26


Second,
The "word counting" feature in 'wc' is the main cpu-hog.
If you avoid that (i.e. counting only lines, or only characters),
'wc' is even faster (and it automatically ignores UTF8 issues):

  $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \
       | tee >(sha256sum>&2) \
       | \time --portability wc -c
  1048576+0 records in
  1048576+0 records out
  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.59429 s, 141 MB/s
  49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -
  1073741824

  real 7.59
  user 0.10
  sys 0.71

Notice that the "real time" wasn't changed much (from 8.6s to 7.59s), but the actual work performed by 'wc' (measured in "user time") is down
drastically.


Third,
If you are comfortable with compiling Coreutils from source,
you can build it using optimized hashing function from OpenSSL, like so:

     ./configure --with-openssl
     make

Then, "sha256sum" will be faster (about 2x fast on my computer).

If you don't want to re-compile it, consider using "openssl" directly
to calculate the checksum, like so:

  dd if=/dev/zero count=1K bs=1M | tee >(openssl sha256>&2) | wc -c


Fourth,
To save few more microseconds, consider using dd with larger block size (bs=) and fewer blocks (count=), e.g.:

   $ time dd if=/dev/zero of=/dev/null count=1M bs=1K
   1048576+0 records in
   1048576+0 records out
   1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.865853 s, 1.2 GB/s

   real 0m0.868s
   user 0m0.288s
   sys  0m0.579s

   $ time dd if=/dev/zero of=/dev/null count=1K bs=1M
   1024+0 records in
   1024+0 records out
   1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.0998688 s, 10.8 GB/s

   real 0m0.102s
   user 0m0.000s
   sys  0m0.102s

This won't reduce the total time by much, but will result in
fewer sys-calls, and less CPU kernel time (at least by a tiny bit).
The effect is more noticeable when reading or writing to a physical disk.

----

Lastly,
If you use GNU time instead of the shell's built-in 'time' function,
you can specify custom output format,
and easily show the timing of each program in the pipeline.
Example:

    $ FMT="\n=== CMD: %C ===\nreal %e\tuser %U\tsys %S\n"
    $ \time -f "$FMT" dd if=/dev/zero count=1M bs=1K \
         | \time -f "$FMT" tee >(\time -f "$FMT" sha256sum>&2) \
         | \time -f "$FMT" wc -c
    1048576+0 records in
    1048576+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.77339 s, 138 MB/s

    === CMD: dd if=/dev/zero count=1048576 bs=1024 ===
    real 7.77   user 0.36       sys 1.65


    === CMD: tee /dev/fd/63 ===
    real 7.77   user 0.10       sys 1.30

    49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -

    === CMD: sha256sum ===
    real 7.77   user 7.47       sys 0.27

    1073741824

    === CMD: wc -c ===
    real 7.77   user 0.05       sys 0.76


As such, I'm closing this as "not a bug",
but discussion can continue by replying to this thread.

regards,
 - assaf






reply via email to

[Prev in Thread] Current Thread [Next in Thread]