pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Windows truncating data - Bug in src/libpspp/ext-array.c


From: John Darrington
Subject: Windows truncating data - Bug in src/libpspp/ext-array.c
Date: Wed, 13 Jun 2012 06:53:16 +0000
User-agent: Mutt/1.5.18 (2008-05-17)

Thanks to the efforts of Harry and Henry, I think we have a lead on
this problem which has been reported on and off on Windows where
the dataset gets truncated and mysterious write errors on the temporary
files are reported.   It would seem that the bug has the
potential to cause problems on systems other than Windows too.

I believe the cause is the optimisation in src/libpspp/ext-array.c
which contains this code:

static bool
do_seek (const struct ext_array *ea_, off_t offset)
{
  struct ext_array *ea = CONST_CAST (struct ext_array *, ea_);

  if (!ext_array_error (ea))
    {
      if (ea->position == offset)
        return true;
      else if (fseeko (ea->file, offset, SEEK_SET) == 0)
        {
          ea->position = offset;
          return true;
        }
      else
        error (0, errno, _("seeking in temporary file"));
    }

  return false;
}

The lines: 
  if (ea->position == offset)
        return true;
avoid performing a seek if the destination of the potential seek
happens to be the current position (which would be the most common case).
This is ok, except when the current operation is a write and the previous
one a read (or vici-versa).

The posix spec says:

  When a file is opened with update mode ( '+' as the second or third character 
  in the mode argument), both input and output may be performed on the 
associated
  stream. However, the application shall ensure that output is not directly 
  followed by input without an intervening call to fflush() or to a file 
  positioning function ( fseek(), fsetpos(), or rewind()), and input is not 
  directly followed by output without an intervening call to a file positioning 
  function, unless the input operation encounters end-of-file.

[http://pubs.opengroup.org/onlinepubs/009695399/functions/fopen.html]


By avoiding the seek, we are violating this condition.

The Microsoft documentation basically says the same:

  When the "r+", "w+", or "a+" access type is specified, both reading and 
  writing are allowed (the file is said to be open for "update"). However, when 
  you switch between reading and writing, there must be an intervening fflush, 
  fsetpos, fseek, or rewind operation. The current position can be specified for
  the fsetpos or fseek operation, if desired.

[http://msdn.microsoft.com/en-us/library/yeby3zcb%28v=vs.80%29.aspx]


Interestingly, until quite recently, the GNU libc documentation, after
mentioning that this requirement exists in the ANSI standard, then had the 
sentance:

  The GNU C library does not have this limitation; you can do arbitrary reading 
  and writing operations on a stream in whatever order.

But this sentance has recently been deleted, and the bug report which 
gave rise to its deletion suggests that it was and had for a long time
been erroneous:

[http://pubs.opengroup.org/onlinepubs/009695399/functions/fopen.html]



So it would seem that it is unsafe (even on GNU/Linux) not to seek (or flush) 
before switching between reading and writing.

I sent Harry a patch which basically disabled this optimisation completely, and
Henry's report suggested that this fixed the problem, but caused the operation
to take a lot longer to run (not suprising).   I suggest that the correct fix
should involve a flag in the ext_array struct which records the direction of
the most recent operation (read or write) and ensures that the seek is always
done if the direction has changed.


J'


-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://keys.gnupg.net or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]