[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: error opening a pspp file
From: |
Ben Pfaff |
Subject: |
Re: error opening a pspp file |
Date: |
Mon, 5 Feb 2018 08:44:13 -0800 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Mon, Feb 05, 2018 at 10:41:04AM +0100, John Darrington wrote:
> On Sun, Feb 04, 2018 at 01:57:53PM -0800, Ben Pfaff wrote:
> Well, in the end the file shows two things:
>
> (1) PSPP should not write a file that it cannot read later: if
> you look at the raw file, it contains question marks. This
> means that PSPP output routines should be more careful about
> insisting on writing the file in a character set that is
> acceptable for later reading.
>
> (2) PSPP should be able to read files that do contain bad
> variable names, probably by replacing unacceptable bytes by some
> kind of placeholder like X.
>
> Here is my guess at what happened:
>
> The charset of the machine used to create the file is something other than
> UTF-8,
> and either it has been (inadvertently) set to something incabable of encoding
> the umlauts he was trying to use, or the iconv library on that machine is
> broken.
>
> So the try_recode routine in libpspp/i18n.c failed, and the fallback char,
> which is '?', see line 1167 was substituted.
>
> What I don't understand is how the user could not have noticed something
> amiss at
> the time of data entry. The variable names should have been rejected when
> entered,
> or at least looked very wierd.
PSPP generally maintains variable names, etc. internally in UTF-8, and I
wonder whether we're not checking that against the charset at all the
appropriate times. I could especially see that happening in the GUI,
where there's an extra layer of indirection.
But in any case I think the reader and writer should be more robust.
I'll work on that.