Re: test-fdutimensat racy?

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: test-fdutimensat racy?

From:	Pádraig Brady
Subject:	Re: test-fdutimensat racy?
Date:	Tue, 21 May 2013 15:02:59 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 05/21/2013 01:52 PM, Bernhard Voelker wrote:
>> 05/03/2013 10:37 AM, Bernhard Voelker wrote:
>> I.e., the numbers did not went significantly down for that non-VM system. ;-/
> 
> Playing with the code a bit and testing on various VMs (where the race
> showed up most probably), it turned out that the multiplier for the nap()
> delay is not sufficient.  Changing it from 1.125 to 3 seemed to make the
> tests quite stable.  However, avoiding the race completely would be much
> nicer.
> 
> What do you think about the following approach?
> 
> Have a nice day,
> Berny
> 
> 
>>From 39f5cb30bb70bbf43741bcad6d30890c34412d75 Mon Sep 17 00:00:00 2001
> From: Bernhard Voelker <address@hidden>
> Date: Tue, 21 May 2013 14:42:42 +0200
> Subject: [PATCH] tests/nap.h: avoid race
> 
> The recent change in nap.h (5191133e) decreased the probability of lost
> races to about a third, however such problems could still be observed
> in virtual machines and openSUSE's OBS.
> 
> Instead of calulating the nap() time once and using it (together with
> a small correction multiplier), avoid the race alltogether by verifying
> on a reference file whether a timestamp difference has happened.
> 
> * tests/nap.h (nap_works): Change return value to bool.  Change passing
> the old file's status by value instead of by reference as this function
> does no longer update that timestamp; rename the function argument from
> st to old_st.  Remove the local variables cdiff and mdiff because that
> function now returns true/false instead of the precise delay.
> (guess_delay): Remove function.
> (nap): Instead of re-using the delay which has been calulated during the
> first call, avoid the race by actually verifying that a timestamp
> difference can be observed on the current file system.  Use an adaptive
> approach for the delay to minimize execution time.  Assert that the
> maximum delay is <= 2 seconds.
> ---
>  tests/nap.h | 103 
> +++++++++++++++++++++++-------------------------------------
>  1 file changed, 40 insertions(+), 63 deletions(-)
> 
> diff --git a/tests/nap.h b/tests/nap.h
> index d8dbad2..09046a8 100644
> --- a/tests/nap.h
> +++ b/tests/nap.h
> @@ -20,6 +20,7 @@
>  # define GLTEST_NAP_H
> 
>  # include <limits.h>
> +#include <stdbool.h>
> 
>  /* Return A - B, in ns.
>     Return 0 if the true result would be negative.
> @@ -53,86 +54,62 @@ get_stat (int fd, struct stat *st, int do_write)
>  }
> 
>  /* Given a file whose descriptor is FD, see whether delaying by DELAY
> -   nanoseconds causes a change in a file's time stamp.  *ST is the
> -   file's status, recently gotten.  Update *ST to reflect the latest
> -   status gotten.  If successful, return the needed delay, in
> -   nanoseconds as determined by the observed time stamps; this may be
> -   greater than DELAY if we crossed a quantization boundary.  If
> -   unsuccessful, return 0.  */
> -static int
> -nap_works (int fd, int delay, struct stat *st)
> +   nanoseconds causes a change in a file's time stamp.  OLD_ST is the
> +   file's status, recently gotten.  */
> +static bool
> +nap_works (int fd, int delay, struct stat old_st)
>  {
> -  struct stat old_st = *st;
> +  struct stat st;
>    struct timespec delay_spec;
> -  int cdiff, mdiff;
>    delay_spec.tv_sec = delay / 1000000000;
>    delay_spec.tv_nsec = delay % 1000000000;
>    ASSERT (nanosleep (&delay_spec, 0) == 0);
> -  get_stat (fd, st, 1);
> +  get_stat (fd, &st, 1);
> 
>    /* Return the greater of the ctime and the mtime differences, or
>       zero if it cannot be determined, or INT_MAX if either overflows.  */
> -  cdiff = diff_timespec (get_stat_ctime (st), get_stat_ctime (&old_st));
> -  if (cdiff != 0)
> -    {
> -      mdiff = diff_timespec (get_stat_mtime (st), get_stat_mtime (&old_st));
> -      if (mdiff != 0)
> -        return cdiff < mdiff ? mdiff : cdiff;
> -    }
> -  return 0;
> -}
> +  if (   diff_timespec (get_stat_ctime (&st), get_stat_ctime (&old_st))
> +      && diff_timespec (get_stat_mtime (&st), get_stat_mtime (&old_st)))
> +    return true;
> 
> -static int
> -guess_delay (void)
> -{
> -  /* Try a 1-ns sleep first, for speed.  If that doesn't work, try 100
> -     ns, 1 microsecond, 1 ms, etc.  xfs has a quantization of about 10
> -     milliseconds, even though it has a granularity of 1 nanosecond,
> -     and NTFS has a default quantization of 15.25 milliseconds, even
> -     though it has a granularity of 100 nanoseconds, so 15.25 ms is a
> -     good quantization to try.  If that doesn't work, try 1 second.
> -     The worst case is 2 seconds, needed for FAT.  */
> -  static int const delaytab[] = {1, 1000, 1000000, 15250000, 1000000000 };
> -  int fd = creat (BASE "tmp", 0600);
> -  int i;
> -  int delay = 2000000000;
> -  struct stat st;
> -  ASSERT (0 <= fd);
> -  get_stat (fd, &st, 0);
> -  for (i = 0; i < sizeof delaytab / sizeof delaytab[0]; i++)
> -    {
> -      int d = nap_works (fd, delaytab[i], &st);
> -      if (d != 0)
> -        {
> -          delay = d;
> -          break;
> -        }
> -    }
> -  ASSERT (close (fd) == 0);
> -  ASSERT (unlink (BASE "tmp") == 0);
> -  return delay;
> +  return false;
>  }
> 
>  /* Sleep long enough to notice a timestamp difference on the file
> -   system in the current directory.  Assumes that BASE is defined,
> -   and requires that the test module depends on nanosleep.  */
> +   system in the current directory.  Use an adaptive approach, trying
> +   to find the smallest delay which works on the current file system
> +   to make the timestamp difference appear.  Assert a maximum delay of
> +   2 seconds.  Assumes that BASE is defined, and requires that the test
> +   module depends on nanosleep.  */
>  static void
>  nap (void)
>  {
> -  static struct timespec delay;
> -  if (!delay.tv_sec && !delay.tv_nsec)
> +  static int fd = -1;
> +  struct stat old_st;
> +  static int delay = 1;
> +
> +  if (-1 == fd)
> +    {
> +      ASSERT ((fd = creat (BASE "naptmp", 0600)) != -1); /* Never closed.  */
> +      ASSERT (unlink (BASE "naptmp") == 0);
> +      get_stat (fd, &old_st, 0);
> +    }
> +  else
>      {
> -      int d = guess_delay ();
> -
> -      /* Multiply by 1.125 (rounding up), to avoid problems if the
> -         file system's clock is a bit slower than nanosleep's.
> -         Ceiling it at INT_MAX, though.  */
> -      int delta = (d >> 3) + ((d & 7) != 0);
> -      d = delta < INT_MAX - d ? d + delta : INT_MAX;
> -      delay.tv_sec = d / 1000000000;
> -      delay.tv_nsec = d % 1000000000;
> +      ASSERT (0 <= fd);
> +      get_stat (fd, &old_st, 1);
>      }
> -  ASSERT (nanosleep (&delay, 0) == 0);
> +
> +  if (1 < delay)
> +    delay = delay / 2;  /* Try half of the previous delay.  */
> +  ASSERT (0 < delay);
> +
> +  for ( ; delay <= 2000000000; delay = delay * 2)
> +    if (nap_works (fd, delay, old_st))
> +      return;

Excellent. This is the truncated exponential backoff
method I was referring to previously (albeit a little less general).

It's probably worth a comment that this may introduce
a total delay of up to: sum(2^n) from 0 to 30 = 2^31 - 1 = 2.1s

> +
> +  /* Bummer: even the highest nap delay didn't work. */
> +  ASSERT (0);
>  }
> 
>  #endif /* GLTEST_NAP_H */
> 

thanks!
Pádraig.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: test-fdutimensat racy?, Paul Eggert, 2013/05/01
- Re: test-fdutimensat racy?, Bernhard Voelker, 2013/05/02
  - Re: test-fdutimensat racy?, Paul Eggert, 2013/05/02
    - Re: test-fdutimensat racy?, Bernhard Voelker, 2013/05/03
    - Re: test-fdutimensat racy?, Bernhard Voelker, 2013/05/21
    - Re: test-fdutimensat racy?, Eric Blake, 2013/05/21
    - Re: test-fdutimensat racy?, Bernhard Voelker, 2013/05/21
    - Re: test-fdutimensat racy?, Eric Blake, 2013/05/21
    - Re: test-fdutimensat racy?, Bernhard Voelker, 2013/05/21
    - Re: test-fdutimensat racy?, Pádraig Brady <=

Prev by Date: Re: test-fdutimensat racy?
Next by Date: Re: test-fdutimensat racy?
Previous by thread: Re: test-fdutimensat racy?
Next by thread: Re: bug#14253: Testsuite failure in gnulib-tests: test programs cannot be built
Index(es):
- Date
- Thread