[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: fix for reading funny compressed data, for review
From: |
Ben Pfaff |
Subject: |
Re: fix for reading funny compressed data, for review |
Date: |
Thu, 15 Oct 2009 21:12:54 -0700 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) |
Thanks. I pushed it out.
John Darrington <address@hidden> writes:
> It seems reasonable to me.
>
> J'
>
> On Wed, Oct 14, 2009 at 09:44:52PM -0700, Ben Pfaff wrote:
> I'd like to push this to the stable branch. Comments
> appreciated.
>
> commit e624e2da6ea68d22e6d4fba4eaa96d37d07a6730
> Author: Ben Pfaff <address@hidden>
> Date: Wed Oct 14 21:20:44 2009 -0700
>
> sys-file-reader: Tolerate nonsensical opcodes in compressed data.
>
> Compressed data in .sav files uses a set of 256 opcodes, some of
> which make
> sense only for numeric data and others of which only make sense for
> string
> data. However, Jereme Thomas <address@hidden> has provided one
> file, written by SPSS 14, that uses an opcode that seems to makes
> sense
> only for numeric data in a string field. So this commit adds
> support for
> these opcodes, although it still warns about the ones other than the
> exact
> one found in the file provided by Jereme.
>
> diff --git a/doc/dev/system-file-format.texi
> b/doc/dev/system-file-format.texi
> index 70fa385..b1be385 100644
> --- a/doc/dev/system-file-format.texi
> +++ b/doc/dev/system-file-format.texi
> @@ -884,6 +884,9 @@ value @var{code} - @var{bias}, where
> variable @code{bias} from the file header. For example,
> code 105 with bias 100.0 (the normal value) indicates a numeric variable
> of value 5.
> +One file has been seen written by SPSS 14 that contained such a code
> +in a @emph{string} field with the value 0 (after the bias is
> +subtracted) as a way of encoding null bytes.
>
> @item 252
> End of file. This code may or may not appear at the end of the data
> diff --git a/src/data/sys-file-reader.c b/src/data/sys-file-reader.c
> index fe7b533..8d973e4 100644
> --- a/src/data/sys-file-reader.c
> +++ b/src/data/sys-file-reader.c
> @@ -86,6 +86,7 @@ struct sfm_reader
> double bias; /* Compression bias, usually 100.0. */
> uint8_t opcodes[8]; /* Current block of opcodes. */
> size_t opcode_idx; /* Next opcode to interpret, 8 if none
> left. */
> + bool corruption_warning; /* Warned about possible corruption? */
> };
>
> static const struct casereader_class sys_file_casereader_class;
> @@ -192,6 +193,7 @@ sfm_open_reader (struct file_handle *fh, struct
> dictionary **dict,
> r->oct_cnt = 0;
> r->has_long_var_names = false;
> r->opcode_idx = sizeof r->opcodes;
> + r->corruption_warning = false;
>
> /* TRANSLATORS: this fragment will be interpolated into
> messages in fh_lock() that identify types of files. */
> @@ -1374,7 +1376,14 @@ read_compressed_number (struct sfm_reader *r,
> double *d)
> break;
>
> case 254:
> - sys_error (r, _("Compressed data is corrupt."));
> + float_convert (r->float_format, " ", FLOAT_NATIVE_DOUBLE,
> d);
> + if (!r->corruption_warning)
> + {
> + r->corruption_warning = true;
> + sys_warn (r, _("Possible compressed data corruption: "
> + "compressed spaces appear in numeric field."));
> + }
> + break;
>
> case 255:
> *d = SYSMIS;
> @@ -1395,7 +1404,8 @@ read_compressed_number (struct sfm_reader *r,
> double *d)
> static bool
> read_compressed_string (struct sfm_reader *r, char *dst)
> {
> - switch (read_opcode (r))
> + int opcode = read_opcode (r);
> + switch (opcode)
> {
> case -1:
> case 252:
> @@ -1410,7 +1420,25 @@ read_compressed_string (struct sfm_reader *r,
> char *dst)
> break;
>
> default:
> - sys_error (r, _("Compressed data is corrupt."));
> + {
> + double value = opcode - r->bias;
> + float_convert (FLOAT_NATIVE_DOUBLE, &value, r->float_format,
> dst);
> + if (value == 0.0)
> + {
> + /* This has actually been seen "in the wild". The
> submitter of the
> + file that showed that the contents decoded as spaces,
> but they
> + were at the end of the field so it's possible that the
> null
> + bytes just acted as null terminators. */
> + }
> + else if (!r->corruption_warning)
> + {
> + r->corruption_warning = true;
> + sys_warn (r, _("Possible compressed data corruption: "
> + "string contains compressed integer (opcode
> %d)"),
> + opcode);
> + }
> + }
> + break;
> }
>
> return true;
>
> --
> Peter Seebach on managing engineers:
> "It's like herding cats, only most of the engineers are already
> sick of laser pointers."
>
>
> _______________________________________________
> pspp-dev mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/pspp-dev
--
Peter Seebach on managing engineers:
"It's like herding cats, only most of the engineers are already
sick of laser pointers."