[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug#10953: Potential logical bug in readtokens.c
From: |
Eric Blake |
Subject: |
Re: bug#10953: Potential logical bug in readtokens.c |
Date: |
Tue, 06 Mar 2012 16:32:02 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1 |
[adding gnulib, since the code in question lives there]
On 03/06/2012 04:21 PM, Paul Eggert wrote:
> Thanks, I agree that code is potentially buggy. I don't see
> any way to trigger the bug in coreutils, but it's just asking
> for trouble. Here's a proposed patch.
>
>>From 4954a3517397dadd217d6244e961dd855fbadbef Mon Sep 17 00:00:00 2001
> From: Paul Eggert <address@hidden>
> Date: Tue, 6 Mar 2012 15:19:24 -0800
> Subject: [PATCH] readtokens: avoid core dumps with unusual calling patterns
>
> Reported by Xu Zhongxing in <http://debbugs.gnu.org/10953>.
> * lib/readtokens.c: Include limits.h.
> (word, bits_per_word, get_nth_bit, set_nth_bit): New.
> (readtoken): Don't cache the delimiters; the cache code was buggy
> if !delim && saved_delim, or if the new n_delim differs from the old.
> Also, it wasn't thread-safe.
The fix looks right to me from a correctness perspective, but I'm afraid
we're still sub-optimal in performance.
> ---
> ChangeLog | 10 +++++++++
> lib/readtokens.c | 56 +++++++++++++++++++++++------------------------------
> 2 files changed, 34 insertions(+), 32 deletions(-)
>
> diff --git a/ChangeLog b/ChangeLog
> index 163e154..eb99d25 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,13 @@
> +2012-03-06 Paul Eggert <address@hidden>
> +
> + readtokens: avoid core dumps with unusual calling patterns
> + Reported by Xu Zhongxing in <http://debbugs.gnu.org/10953>.
> + * lib/readtokens.c: Include limits.h.
> + (word, bits_per_word, get_nth_bit, set_nth_bit): New.
> + (readtoken): Don't cache the delimiters; the cache code was buggy
> + if !delim && saved_delim, or if the new n_delim differs from the old.
> + Also, it wasn't thread-safe.
> +
> 2012-03-06 Bruno Haible <address@hidden>
>
> math: Ensure declarations of math functions.
> diff --git a/lib/readtokens.c b/lib/readtokens.c
> index 705a62b..f11d3f6 100644
> --- a/lib/readtokens.c
> +++ b/lib/readtokens.c
> @@ -26,6 +26,7 @@
>
> #include "readtokens.h"
>
> +#include <limits.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> @@ -46,6 +47,22 @@ init_tokenbuffer (token_buffer *tokenbuffer)
> tokenbuffer->buffer = NULL;
> }
>
> +typedef size_t word;
> +enum { bits_per_word = sizeof (word) * CHAR_BIT };
> +
> +static bool
> +get_nth_bit (size_t n, word const *bitset)
> +{
> + return bitset[n / bits_per_word] >> n % bits_per_word & 1;
> +}
> +
> +static void
> +set_nth_bit (size_t n, word *bitset)
> +{
> + size_t one = 1;
> + bitset[n / bits_per_word] |= one << n % bits_per_word;
> +}
> +
> /* Read a token from STREAM into TOKENBUFFER.
> A token is delimited by any of the N_DELIM bytes in DELIM.
> Upon return, the token is in tokenbuffer->buffer and
> @@ -68,42 +85,17 @@ readtoken (FILE *stream,
> char *p;
> int c;
> size_t i, n;
> - static const char *saved_delim = NULL;
> - static char isdelim[256];
> - bool same_delimiters;
> -
> - if (delim == NULL && saved_delim == NULL)
> - abort ();
> + word isdelim[(UCHAR_MAX + bits_per_word) / bits_per_word];
>
> - same_delimiters = false;
> - if (delim != saved_delim && saved_delim != NULL)
> + memset (isdelim, 0, sizeof isdelim);
> + for (i = 0; i < n_delim; i++)
> {
> - same_delimiters = true;
> - for (i = 0; i < n_delim; i++)
> - {
> - if (delim[i] != saved_delim[i])
> - {
> - same_delimiters = false;
> - break;
> - }
> - }
> - }
> -
> - if (!same_delimiters)
> - {
> - size_t j;
> - saved_delim = delim;
> - memset (isdelim, 0, sizeof isdelim);
> - for (j = 0; j < n_delim; j++)
> - {
> - unsigned char ch = delim[j];
> - isdelim[ch] = 1;
> - }
> + unsigned char ch = delim[i];
> + set_nth_bit (ch, isdelim);
> }
>
> - /* FIXME: don't fool with this caching. Use strchr instead. */
> /* skip over any leading delimiters */
> - for (c = getc (stream); c >= 0 && isdelim[c]; c = getc (stream))
> + for (c = getc (stream); c >= 0 && get_nth_bit (c, isdelim); c = getc
> (stream))
Why not just strchr instead of building up an isdelim bitmap? And why
are we calling getc() one character at a time, instead of using tricks
like freadahead() to operate on a larger buffer?
Also, is readtoken() intended to be a more powerful interface than
strtok, in which case we _do_ want to be non-threadsafe, and to have a
readtoken_r interface that is the underlying threadsafe variant that can
benefit from caching?
> {
> /* empty */
> }
> @@ -124,7 +116,7 @@ readtoken (FILE *stream,
> p[i] = 0;
> break;
> }
> - if (isdelim[c])
> + if (get_nth_bit (c, isdelim))
> {
> p[i] = 0;
> break;
--
Eric Blake address@hidden +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
- Re: bug#10953: Potential logical bug in readtokens.c,
Eric Blake <=