emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#20029: closed ('yes' surprisingly slow)


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#20029: closed ('yes' surprisingly slow)
Date: Tue, 10 Mar 2015 00:51:02 +0000

Your message dated Tue, 10 Mar 2015 00:50:41 +0000
with message-id <address@hidden>
and subject line Re: bug#20029: 'yes' surprisingly slow
has caused the debbugs.gnu.org bug report #20029,
regarding 'yes' surprisingly slow
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
20029: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20029
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: 'yes' surprisingly slow Date: Sat, 7 Mar 2015 12:49:10 +0100
These two commands give the same output:

$ yes `echo {1..1000}` | head -c 2300M | md5sum
a0241f2247e9a37db60e7def3e4f7038  -

$ yes "`echo {1..1000}`" | head -c 2300M | md5sum
a0241f2247e9a37db60e7def3e4f7038  -

But the time to run is quite different:

$ time yes "`echo {1..1000}`" | head -c 2300M >/dev/null

real    0m0.897s
user    0m0.384s
sys     0m1.343s

$ time yes `echo {1..1000}` | head -c 2300M >/dev/null

real    0m11.352s
user    0m10.571s
sys     0m2.590s

WTF?!

I imagine 'yes' spends a lot of time collecting the 1000 args. But why
does it do that more than once?


/Ole



--- End Message ---
--- Begin Message --- Subject: Re: bug#20029: 'yes' surprisingly slow Date: Tue, 10 Mar 2015 00:50:41 +0000 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
On 10/03/15 00:31, Giuseppe Scrivano wrote:
> Pádraig Brady <address@hidden> writes:
> 
>> The attached should make things more efficient here.
>>
>> thanks,
>> Pádraig.
>>
>>
>> From 7959bbf19307705e98f08cfa32a9dcf67672590c Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?P=C3=A1draig=20Brady?= <address@hidden>
>> Date: Mon, 9 Mar 2015 19:27:32 +0000
>> Subject: [PATCH] yes: output data more efficiently
>>
>> yes(1) may be used to generate repeating patterns of text
>> for test inputs etc., so adjust to be more efficient.
>>
>> Profiling the case where yes(1) is outputting small items
>> through stdio (which was the default case), shows the overhead
>> continuously processing small items in main() and in stdio:
>>
>>     $ yes >/dev/null & perf top -p $!
>>     31.02%  yes           [.] main
>>     27.36%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>>     14.51%  libc-2.20.so  [.] fputs_unlocked
>>     13.50%  libc-2.20.so  [.] strlen
>>     10.66%  libc-2.20.so  [.] __GI___mempcpy
>>      1.98%  yes           [.] address@hidden
>>
>> Sending more data per stdio call improves the situation,
>> but still, there is significant stdio overhead due to memory copies,
>> and the repeated string length checking:
>>
>>     $ yes "`echo {1..1000}`" >/dev/null & perf top -p $!
>>     42.26%  libc-2.20.so  [.] __GI___mempcpy
>>     17.38%  libc-2.20.so  [.] strlen
>>      5.21%  [kernel]      [k] __srcu_read_lock
>>      4.58%  [kernel]      [k] __srcu_read_unlock
>>      4.27%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>>      2.50%  libc-2.20.so  [.] __GI___libc_write
>>      2.45%  [kernel]      [k] system_call
>>      2.40%  [kernel]      [k] system_call_after_swapgs
>>      2.27%  [kernel]      [k] vfs_write
>>      2.09%  libc-2.20.so  [.] _IO_do_write@@GLIBC_2.2.5
>>      2.01%  [kernel]      [k] fsnotify
>>      1.95%  libc-2.20.so  [.] _IO_file_write@@GLIBC_2.2.5
>>      1.44%  yes           [.] main
>>
>> We can avoid all stdio overhead by building up the buffer
>> _once_ and outputting that, and the profile below shows
>> the bottleneck moved to the kernel:
>>
>>     $ src/yes >/dev/null & perf top -p $!
>>     15.42%  [kernel]      [k] __srcu_read_lock
>>     12.98%  [kernel]      [k] __srcu_read_unlock
>>      9.41%  libc-2.20.so  [.] __GI___libc_write
>>      9.11%  [kernel]      [k] vfs_write
>>      8.35%  [kernel]      [k] fsnotify
>>      8.02%  [kernel]      [k] system_call
>>      5.84%  [kernel]      [k] system_call_after_swapgs
>>      4.54%  [kernel]      [k] __fget_light
>>      3.98%  [kernel]      [k] sys_write
>>      3.65%  [kernel]      [k] selinux_file_permission
>>      3.44%  [kernel]      [k] rw_verify_area
>>      2.94%  [kernel]      [k] __fsnotify_parent
>>      2.76%  [kernel]      [k] security_file_permission
>>      2.39%  yes           [.] main
>>      2.17%  [kernel]      [k] __fdget_pos
>>      2.13%  [kernel]      [k] sysret_check
>>      0.81%  [kernel]      [k] write_null
>>      0.36%  yes           [.] address@hidden
>>
>> Note this change also ensures that yes will only write complete lines
>> for lines softer than BUFSIZ.
>>
>> * src/yes.c (main): Build up a BUFSIZ buffer of lines,
>> and output that, rather than having stdio process each item.
>> * tests/misc/yes.sh: Add a new test for various buffer sizes.
>> * tests/local.mk: Reference the new test.
>> Fixes http://bugs.gnu.org/20029
>> ---
>>  src/yes.c         | 43 +++++++++++++++++++++++++++++++++++++++++--
>>  tests/local.mk    |  1 +
>>  tests/misc/yes.sh | 28 ++++++++++++++++++++++++++++
>>  3 files changed, 70 insertions(+), 2 deletions(-)
>>  create mode 100755 tests/misc/yes.sh
>>
>> diff --git a/src/yes.c b/src/yes.c
>> index b35b13f..91dea11 100644
>> --- a/src/yes.c
>> +++ b/src/yes.c
>> @@ -58,6 +58,10 @@ Repeatedly output a line with all specified STRING(s), or 
>> 'y'.\n\
>>  int
>>  main (int argc, char **argv)
>>  {
>> +  char buf[BUFSIZ];
>> +  char *pbuf = buf;
>> +  int i;
>> +
>>    initialize_main (&argc, &argv);
>>    set_program_name (argv[0]);
>>    setlocale (LC_ALL, "");
>> @@ -77,9 +81,44 @@ main (int argc, char **argv)
>>        argv[argc++] = bad_cast ("y");
>>      }
>>  
>> -  while (true)
>> +  /* Buffer data locally once, rather than having the
>> +     large overhead of stdio buffering each item.   */
>> +  for (i = optind; i < argc; i++)
>> +    {
>> +      size_t len = strlen (argv[i]);
>> +      if (BUFSIZ < len || BUFSIZ - len <= pbuf - buf)
>> +        break;
>> +      memcpy (pbuf, argv[i], len);
>> +      pbuf += len;
>> +      *pbuf++ = i == argc - 1 ? '\n' : ' ';
>> +    }
>> +  if (i < argc)
>> +    pbuf = NULL;
> 
> since the buffer is partly filled, wouldn't be better to reuse it?
> 
> Something like this (barely tested):
> 
> diff --git a/src/yes.c b/src/yes.c
> index 91dea11..ac690ce 100644
> --- a/src/yes.c
> +++ b/src/yes.c
> @@ -92,9 +92,7 @@ main (int argc, char **argv)
>        pbuf += len;
>        *pbuf++ = i == argc - 1 ? '\n' : ' ';
>      }
> -  if (i < argc)
> -    pbuf = NULL;
> -  else
> +  if (i == argc)
>      {
>        size_t line_len = pbuf - buf;
>        size_t lines = BUFSIZ / line_len;
> @@ -106,7 +104,7 @@ main (int argc, char **argv)
>      }
>  
>    /* The normal case is to continuously output the local buffer.  */
> -  while (pbuf)
> +  while (i == argc)
>      {
>        if (write (STDOUT_FILENO, buf, pbuf - buf) == -1)
>          {
> @@ -117,11 +115,17 @@ main (int argc, char **argv)
>  
>    /* If the data doesn't fit in BUFSIZ then it's large
>       and not too onerous to output using stdio in any case.  */
> -  while (! pbuf)
> +  while (i != argc)
>      {
> -      for (i = optind; i < argc; i++)
> -        if (fputs (argv[i], stdout) == EOF
> -            || putchar (i == argc - 1 ? '\n' : ' ') == EOF)
> +      int j;
> +      if ((pbuf - buf) && fwrite (buf, pbuf - buf, 1, stdout) == 0)
> +        {
> +          error (0, errno, _("standard output"));
> +          return EXIT_FAILURE;
> +        }
> +      for (j = i; j < argc; j++)
> +        if (fputs (argv[j], stdout) == EOF
> +            || putchar (j == argc - 1 ? '\n' : ' ') == EOF)
>            {
>              error (0, errno, _("standard output"));
>              return EXIT_FAILURE;
> 
> 
> The difference seems to be (without the patch):
> 
> $ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
> 55c293324aa6ecce14f0bf30da5a4686  -
> 
> real  0m7.994s
> user  0m11.093s
> sys   0m2.953s
> 
> versus (with the patch):
> 
> $ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
> 55c293324aa6ecce14f0bf30da5a4686  -
> 
> real  0m3.534s
> user  0m4.164s
> sys   0m1.803s

A fair point for the many small arguments case,
though doesn't help the few large arguments case.
Given this is only slightly more complicated,
we might as well do this also.

thanks,
Pádraig.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]