bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#10953: Potential logical bug in readtokens.c


From: Eric Blake
Subject: Re: bug#10953: Potential logical bug in readtokens.c
Date: Tue, 06 Mar 2012 16:32:02 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1

[adding gnulib, since the code in question lives there]

On 03/06/2012 04:21 PM, Paul Eggert wrote:
> Thanks, I agree that code is potentially buggy.  I don't see
> any way to trigger the bug in coreutils, but it's just asking
> for trouble.  Here's a proposed patch.
> 
>>From 4954a3517397dadd217d6244e961dd855fbadbef Mon Sep 17 00:00:00 2001
> From: Paul Eggert <address@hidden>
> Date: Tue, 6 Mar 2012 15:19:24 -0800
> Subject: [PATCH] readtokens: avoid core dumps with unusual calling patterns
> 
> Reported by Xu Zhongxing in <http://debbugs.gnu.org/10953>.
> * lib/readtokens.c: Include limits.h.
> (word, bits_per_word, get_nth_bit, set_nth_bit): New.
> (readtoken): Don't cache the delimiters; the cache code was buggy
> if !delim && saved_delim, or if the new n_delim differs from the old.
> Also, it wasn't thread-safe.

The fix looks right to me from a correctness perspective, but I'm afraid
we're still sub-optimal in performance.

> ---
>  ChangeLog        |   10 +++++++++
>  lib/readtokens.c |   56 +++++++++++++++++++++++------------------------------
>  2 files changed, 34 insertions(+), 32 deletions(-)
> 
> diff --git a/ChangeLog b/ChangeLog
> index 163e154..eb99d25 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,13 @@
> +2012-03-06  Paul Eggert  <address@hidden>
> +
> +     readtokens: avoid core dumps with unusual calling patterns
> +     Reported by Xu Zhongxing in <http://debbugs.gnu.org/10953>.
> +     * lib/readtokens.c: Include limits.h.
> +     (word, bits_per_word, get_nth_bit, set_nth_bit): New.
> +     (readtoken): Don't cache the delimiters; the cache code was buggy
> +     if !delim && saved_delim, or if the new n_delim differs from the old.
> +     Also, it wasn't thread-safe.
> +
>  2012-03-06  Bruno Haible  <address@hidden>
>  
>       math: Ensure declarations of math functions.
> diff --git a/lib/readtokens.c b/lib/readtokens.c
> index 705a62b..f11d3f6 100644
> --- a/lib/readtokens.c
> +++ b/lib/readtokens.c
> @@ -26,6 +26,7 @@
>  
>  #include "readtokens.h"
>  
> +#include <limits.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
> @@ -46,6 +47,22 @@ init_tokenbuffer (token_buffer *tokenbuffer)
>    tokenbuffer->buffer = NULL;
>  }
>  
> +typedef size_t word;
> +enum { bits_per_word = sizeof (word) * CHAR_BIT };
> +
> +static bool
> +get_nth_bit (size_t n, word const *bitset)
> +{
> +  return bitset[n / bits_per_word] >> n % bits_per_word & 1;
> +}
> +
> +static void
> +set_nth_bit (size_t n, word *bitset)
> +{
> +  size_t one = 1;
> +  bitset[n / bits_per_word] |= one << n % bits_per_word;
> +}
> +
>  /* Read a token from STREAM into TOKENBUFFER.
>     A token is delimited by any of the N_DELIM bytes in DELIM.
>     Upon return, the token is in tokenbuffer->buffer and
> @@ -68,42 +85,17 @@ readtoken (FILE *stream,
>    char *p;
>    int c;
>    size_t i, n;
> -  static const char *saved_delim = NULL;
> -  static char isdelim[256];
> -  bool same_delimiters;
> -
> -  if (delim == NULL && saved_delim == NULL)
> -    abort ();
> +  word isdelim[(UCHAR_MAX + bits_per_word) / bits_per_word];
>  
> -  same_delimiters = false;
> -  if (delim != saved_delim && saved_delim != NULL)
> +  memset (isdelim, 0, sizeof isdelim);
> +  for (i = 0; i < n_delim; i++)
>      {
> -      same_delimiters = true;
> -      for (i = 0; i < n_delim; i++)
> -        {
> -          if (delim[i] != saved_delim[i])
> -            {
> -              same_delimiters = false;
> -              break;
> -            }
> -        }
> -    }
> -
> -  if (!same_delimiters)
> -    {
> -      size_t j;
> -      saved_delim = delim;
> -      memset (isdelim, 0, sizeof isdelim);
> -      for (j = 0; j < n_delim; j++)
> -        {
> -          unsigned char ch = delim[j];
> -          isdelim[ch] = 1;
> -        }
> +      unsigned char ch = delim[i];
> +      set_nth_bit (ch, isdelim);
>      }
>  
> -  /* FIXME: don't fool with this caching.  Use strchr instead.  */
>    /* skip over any leading delimiters */
> -  for (c = getc (stream); c >= 0 && isdelim[c]; c = getc (stream))
> +  for (c = getc (stream); c >= 0 && get_nth_bit (c, isdelim); c = getc 
> (stream))

Why not just strchr instead of building up an isdelim bitmap?  And why
are we calling getc() one character at a time, instead of using tricks
like freadahead() to operate on a larger buffer?

Also, is readtoken() intended to be a more powerful interface than
strtok, in which case we _do_ want to be non-threadsafe, and to have a
readtoken_r interface that is the underlying threadsafe variant that can
benefit from caching?

>      {
>        /* empty */
>      }
> @@ -124,7 +116,7 @@ readtoken (FILE *stream,
>            p[i] = 0;
>            break;
>          }
> -      if (isdelim[c])
> +      if (get_nth_bit (c, isdelim))
>          {
>            p[i] = 0;
>            break;

-- 
Eric Blake   address@hidden    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]