Re: Patches for GNU tar 1.13.25

bug-gnu-utils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Patches for GNU tar 1.13.25

From:	Bruce Lilly
Subject:	Re: Patches for GNU tar 1.13.25
Date:	Sun, 14 Jul 2002 15:41:26 -0400
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8) Gecko/20020204

Paul Eggert wrote:

From: Bruce Lilly <address@hidden>
Date: Sat, 13 Jul 2002 10:51:44 -0400

The attached patch addresses most of the issues
when compiling GNU tar 1.13.25 on UWIN


Thanks for doing the port.  Which version of UWIN, and which compiler
and library?


UWIN 3.0 using the MSVC++ 6.0 SP5 compiler. UWIN libraries,
which include the AST library (which is where the iconv bug
resides; also an fnmatch memory leak bug).  AST is available
separately for a variety of platforms from
http://www.research.att.com/sw/download
And the native compiler's libraries are used to some extent.

Several issues noted remain unresolved, some others
(e.g. application of unary minus to unsigned quantities) have not
even been addressed


I don't understand this point, as unary minus has well-defined
behavior on unsigned quantities in C.

The compiler emits warnings such as:

create.c(184) : warning C4146: unary minus operator applied to unsigned type, 
result still unsigned
create.c(195) : warning C4146: unary minus operator applied to unsigned type, 
result still unsigned

>>most of those issues could be avoided by using double throughout for
>>large quantities (double has a 64-bit mantissa) rather than
>>non-portable and implementation-dependent constructs].
>

Sorry, I don't understand this point either.  We do need to avoid
unportable constructs.  But portable code cannot assume that double
has a 64-bit mantissa.  On my SPARC host, for example, double has only
a 53-bit fraction.  On x86, double expressions sometimes have a 64-bit
fraction, and sometimes a 53-bit fraction, depending on the vagaries
of one's compiler and runtime system.


Then there are a bunch of potential problems with conversion
between double and uintmax_t (both directions).  How about
long double (ANSI C 1990 6.1.2.5)?

1. no supplied conversion between uintmax_t and
  double. Portable and efficient conversion
  code supplied.


I take it that your compiler does not support conversion from
uintmax_t to double?  If so, we should fix "configure" so that it
detects that problem, and defines uintmax_t to be 'long'.


human.c(190) : error C2520: conversion from unsigned __int64 to double not 
implemented, use signed __int64
human.c(90) : error C2520: conversion from unsigned __int64 to double not 
implemented, use signed __int64
human.c(90) : error C2520: conversion from unsigned __int64 to double not 
implemented, use signed __int64

In this case, defining uintmax_t to (presumably unsigned) long would
seem a rather extreme measure given that there *is* 64-bit integer
support, including support for conversion from signed 64-bit integers
to double and from double to signed and unsigned 64-bit integers.  As
mentioned, I have supplied portable uintmax_t to double conversion code
in several places where required.

Can you supply a little test program that illustrates the compiler
problem?  For example what happens if you compile and run the
following little program?  It should output "(double)
9223372036854775808 == 9223372036854775808" on any machine with a
radix that is a power of 2.

  #include <inttypes.h>

  #define verify(name, assertion) struct name { char a[(assertion) ? 1 : -1]; }
  verify (convert_uintmax_t_to_double,
          (double) (uintmax_t) 9223372036854775808 == 9223372036854775808.0);
  verify (convert_double_to_uintmax_t,
          (uintmax_t) 9223372036854775808.0 == 9223372036854775808);


That gives an error "negative subscript or subscript too large".

  #include <stdio.h>

  uintmax_t u = 9223372036854775808;

  int main (void)
  {
    double d = u;

      ^^^^^^^^^^^^^
This will give the "conversion [...] not implemented" compilation error.
In the presence of the earlier error, the compiler gives only a warning
about "possible loss of data" in the conversion.

    printf ("(double) 9223372036854775808 == %.20g\n", d);
    return 0;
  }

2. getdate.[cy] needs to declare mktime if the
  replacement function is used.  extern
  declaration should be harmless even if the
  system mktime is used.


I worry that some hosts do funky things with mktime defines in <time.h>.
How about this patch instead of the one you supplied?

--- getdate.y   2002/01/16 22:43:41     1.28
+++ getdate.y   2002/07/14 03:25:09
@@ -27,9 +27,14 @@

#ifdef HAVE_CONFIG_H

 # include <config.h>
-# ifdef HAVE_ALLOCA_H
-#  include <alloca.h>
-# endif
+#endif
+
+#ifdef mktime
+# define MKTIME_IS_REPLACED 1
+#endif
+
+#if HAVE_ALLOCA_H
+# include <alloca.h>
 #endif

/* Since the code of getdate.y is not included in the Emacs executable

@@ -455,7 +460,7 @@ struct tm *gmtime ();
 #ifndef localtime
 struct tm *localtime ();
 #endif
-#ifndef mktime
+#if ! defined mktime || MKTIME_IS_REPLACED
 time_t mktime ();
 #endif


That ought to work OK, but please include "extern".

3. do-nothing code in human.c #ifdef'ed out.


But it's not do-nothing code.  Here are some details:

+    b) does nothing useful (see comments below)
+ */
   if (inexact_style != human_round_to_even && value < (uintmax_t) -1)
     {
!       uintmax_t u = value;    /* N.B. u == value (unconditionally) */


u might not equal value.  value is of type double, so it might not be
an integer,


OK, point taken.

and it might be out of range for uintmax_t.


Not if I understand the intent of "value < (uintmax_t) -1" in
the enclosing if condition.

>  So it is

possible that (u != value) == 1.  (It is also possible that u does not
equal value, but (u != value) == 0, due to the vagaries of C
arithmetic; but that is a different story.)


If it were clear what the function adjust_value() were intended to do
(it is not clear), then I might be able to supply a replacement that
is portable w.r.t. lack of uintmax_t -> double conversion supplied by
the compiler.  Also, there probably ought to be a macro defined
for the maximum value of uintmax_t (a la INT_MAX) rather than the
(uintmax_t) -1 cast.

+    Most of the code in this file needs substantial revision.


The revision for GiB etc. has already been done; see
<http://savannah.gnu.org/cgi-bin/viewcvs/gnulib/gnulib/lib/human.c>.
This should appear in the next version of tar.


Great!

4. inconsistent placement of const qualifiers
  made more consistent.


This sort of patch shouldn't be needed.  For example, 'char const' is
equivalent to 'const char' on all Standard C compilers.  If you have a
compiler that doesn't support 'const' properly, then 'configure'
should arrange for '#define const /**/' to appear in config.h.  If
config.h is incorrect, let's fix that bug rather than munge the code.


The issue isn't the compiler, but other tools (including cscope,
seach for "const char" in text editors, etc.), and the main point
is that there ought to be some consistency within the package;
the reversed use of const seems to appear in only the one .c file.

5. non-portable tests which assume that size_t
  is signed replaced by portable tests.


I don't understand this point.  size_t is guaranteed to be unsigned,
and the code assumes this.  Is size_t signed on your host?


size_t is unsigned.  Consider the following from lib/savedir.c:

  size_t allocated = NAME_SIZE_DEFAULT;
  size_t used = 0;
[...]
        {
          size_t entry_size = strlen (entry) + 1;
          if (used + entry_size < used)
            xalloc_die ();
          if (allocated <= used + entry_size)
            {
              do
                {
                  if (2 * allocated < allocated)
                    xalloc_die ();
                  allocated *= 2;
                }
              while (allocated <= used + entry_size);

              name_space = xrealloc (name_space, allocated);
            }
          memcpy (name_space + used, entry, entry_size);
          used += entry_size;
        }

I'm uncomfortable with the "used + entry_size < used"
and "2 * allocated < allocated" conditionals, which
depend on overfow handling (not signed vs. unsigned).
There are no compiler diagnostics or errors, and it probably
works, but it still makes me cringe.  I think I changed this
when I was looking for the source of a memory leak (which
turned out to be in an external library).  I suppose you
could leave this bit unpatched.

6. another case of the iconv "bug" accounted for.
*** lib/unicodeio.c.orig        Tue Sep 25 18:52:20 2001
--- lib/unicodeio.c     Tue Jul 02 21:14:16 2002
***************
*** 209,215 ****

/* Avoid glibc-2.1 bug and Solaris 2.7 bug. */

 # if defined _LIBICONV_VERSION \
!     || !((__GLIBC__ - 0 == 2 && __GLIBC_MINOR__ - 0 <= 1) || defined __sun)

/* Get back to the initial shift state. */

       res = iconv (utf8_to_local, NULL, NULL, &outptr, &outbytesleft);
--- 209,215 ----

/* Avoid glibc-2.1 bug and Solaris 2.7 bug. */

 # if defined _LIBICONV_VERSION \
!     || !((__GLIBC__ - 0 == 2 && __GLIBC_MINOR__ - 0 <= 1) || defined __sun || 
defined _UWIN)

/* Get back to the initial shift state. */

       res = iconv (utf8_to_local, NULL, NULL, &outptr, &outbytesleft);


Doesn't UWIN use glibc?  If not, which library and version does UWIN
use?  If UWIN uses a particular C library, I'd rather have the ifdef
depend on that rather than on the more-generic _UWIN symbol.
(Presumably the bug will be fixed at some point.)


UWIN uses C code from the AST library (where the iconv "bug" is
located) as well as other libraries (including a few of the
non-broken MS ones [*]).  The iconv issue has been reported and I
expect that it will be addressed.  I have also pointed out the
need for version identification so that the conditional can
be appropiately constructed.  But it's not there yet.  Try

... || (defined (_AST_VERSION) && (_AST_VERSION <= 20020214L))

which should work when AST is used on non-UWIN systems also.

7. test for O_ECL vs. O_TRUNC for file creation
  during extraction corrected.
*** src/extract.c.orig  Mon Sep 24 14:55:17 2001
--- src/extract.c       Sat Jul 13 08:13:17 2002
***************
*** 766,772 ****

again_file:

       openflag = (O_WRONLY | O_BINARY | O_CREAT
!                 | (old_files_option == OVERWRITE_OLD_FILES
                     ? O_TRUNC
                     : O_EXCL));
       mode = current_stat.st_mode & MODE_RWX & ~ current_umask;
--- 766,772 ----

again_file:

       openflag = (O_WRONLY | O_BINARY | O_CREAT
!                 | (old_files_option != KEEP_OLD_FILES
                     ? O_TRUNC
                     : O_EXCL));
       mode = current_stat.st_mode & MODE_RWX & ~ current_umask;


I don't see the bug here.  For example, if old_files_option ==
DEFAULT_OLD_FILES, the code should use O_EXCL, because tar does not
want to overwrite existing files in that case.


tar x was failing in some cases due to file/dir ownership issues;
it created empty files with a return of -1 and errno set to EACCES.
tar --overwrite -x would work, but would of course be incompatible
with conventional (i.e. non GNU) tar x (which showed no such problem
[implemented in UWIN as a shell script which calls pax] under the
identical conditions).  The change worked with tar x (correctly
overwrote existing files) and with tar -xk (did not overwrite
existing files). Perhaps the old_files_option test needs to be more
complex.

Alternatively, errno could be checked for EEXIST (legitimate
cause for an aborted extraction with O_CREAT | O_EXCL) but without
aborting the extraction otherwise.

This also might be a UWIN bug in the open() implementation. I'm
investigating.

*** lib/exclude.h.orig  Sun Aug 26 19:07:15 2001
--- lib/exclude.h       Tue Jul 02 14:13:28 2002
***************
*** 30,43 ****
/* Patterns must match the start of file names, instead of matching
    anywhere after a '/'.  */
! #define EXCLUDE_ANCHORED (1 << 5)
/* Include instead of exclude. */
! #define EXCLUDE_INCLUDE (1 << 6)
/* '?', '*', '[', and '\\' are special in patterns. Without this
    option, these characters are ordinary and fnmatch is not used.  */
! #define EXCLUDE_WILDCARDS (1 << 7)
struct exclude;--- 30,43 ----/* Patterns must match the start of file names, instead of matching
    anywhere after a '/'.  */
! #define EXCLUDE_ANCHORED (1 << 7)
/* Include instead of exclude. */
! #define EXCLUDE_INCLUDE (1 << 8)
/* '?', '*', '[', and '\\' are special in patterns. Without this
    option, these characters are ordinary and fnmatch is not used.  */
! #define EXCLUDE_WILDCARDS (1 << 9)
struct exclude;
Why was the above patch needed?  It doesn't seem to match any of your
comments.


AST fnmatch.h provides extensions which use bits through 0x0040
(a.k.a. 1<<6).  Incidentally, AST fnmatch has a nasty memory leak
which causes --exclude-from to bomb.  I don't expect configure to
be able to catch that sort of run-time issue; I patched config.status
by hand to use the included GNU fnmatch for now.  At some point in
the near future I expect the AST fnmatch memory leak to be plugged.

*** src/create.c.orig   Wed Aug 29 17:21:02 2001
--- src/create.c        Tue Jul 02 20:25:17 2002
***************
*** 1108,1114 ****
           (entrylen = strlen (entry)) != 0;
           entry += entrylen + 1)
        {
!         if (buflen <= len + entrylen)
            {
              buflen = len + entrylen;
              namebuf = xrealloc (namebuf, buflen + 1);
--- 1108,1114 ----
           (entrylen = strlen (entry)) != 0;
           entry += entrylen + 1)
        {
!         if (buflen < len + entrylen)
            {
              buflen = len + entrylen;
              namebuf = xrealloc (namebuf, buflen + 1);


This patch seems incorrect to me.  Surely it can cause a buffer
overflow if buflen == len + entrylen, because the buffer needs to be
enlarged in that case.


How so? The buffer is allocated (and reallocated) to
bufen + 1.

From: Bruce Lilly <address@hidden>
Date: Sat, 13 Jul 2002 12:05:09 -0400

The following supplementary patch may be required on some systems:
*** lib/savedir.c.orig    Wed Jul  3 01:34:55 2002
--- lib/savedir.c       Sat Jul 13 11:56:14 2002
***************
*** 60,65 ****
--- 60,69 ----
  # define NULL 0
  #endif

+ #if HAVE_LIMITS_H || _LIBC
+ # include <limits.h>
+ #endif
+
  #include "savedir.h"
  #include "xalloc.h"


Hmm, why?  savedir.c doesn't use anything that limits.h provides.
The "|| _LIBC" shouldn't be needed in any case, as the file
isn't an internal glibc file.


This relates to the change to use INT_MAX rather than relying
on overflow.  I copied the include wrapper from another file in
the distribution.

===========
* Earlier versions of UWIN also supported gcc as the compiler, but
that had some annoying problems and AFAIK the gcc package
(http://www.wipro.com/uwin/uwin-ports.htm)
hasn't been updated for current versions of gcc or UWIN.  UWIN
supposedly can also use other native compilers (e.g. Borland), but
I have no way to test that here.

Best regards,
  Bruce Lilly

[Prev in Thread]

Current Thread

[Next in Thread]

Patches for GNU tar 1.13.25, Bruce Lilly, 2002/07/13
- Re: Patches for GNU tar 1.13.25, Bruce Lilly, 2002/07/13
- Re: Patches for GNU tar 1.13.25, Paul Eggert, 2002/07/14
  - Re: Patches for GNU tar 1.13.25, Bruce Lilly <=
    - Re: Patches for GNU tar 1.13.25, Paul Eggert, 2002/07/15
    - Re: Patches for GNU tar 1.13.25, Bruce Lilly, 2002/07/15

Prev by Date: DON'T FALL FOR THE BIG LIE.
Next by Date: 好久不見了，最近忙嗎？！
Previous by thread: Re: Patches for GNU tar 1.13.25
Next by thread: Re: Patches for GNU tar 1.13.25
Index(es):
- Date
- Thread