[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts
From: |
Ben Pfaff |
Subject: |
Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names |
Date: |
Tue, 18 Feb 2014 09:33:16 -0800 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Tue, Feb 18, 2014 at 03:17:26PM +0000, M??ller, Andre wrote:
> I have tried the SYSFILE INFO and it works quite well.
> For now, I have piped some examples of uncommon codepages through it,
> and it does well for SHIFT_JIS and IBM850 (or similar), for example.
>
> The broken files I have, that actually contain entries in more than
> one codepage are not a valid test, but even then, I found at least
> some of the codepages it contains as suggestions. That's nice.
I agree that mixed-codepage files are not a valid test ;-)
> Another rather unfair testcase is a failure to identify a source file
> in DIN_66003 coding, but that really is to be expected -- DIN_66003 is
> a 7-bit-safe codepage for german, where a?????????????? take the place
> of us-ascii's {|}[\]~@, respectively. An evil solution for problems
> long gone. I think it's sane to not try and handle 7-bit non-ascii
> codings, so that's just to let you know. Really I cannot think of any
> way of handling them short of looking at oddities in character counts
> or success rates with matches against dictionarys.
The code that I wrote doesn't really identify encodings at all.
Instead, it just tries to recode all the strings in the file from each
of several possible encodings to UTF-8. That means that it's easy to
add more encodings, including DIN_66003. The encodings that I chose are
fairly arbitrary: I took them from the list at
http://encoding.spec.whatwg.org/. I can add DIN_66003; no problem. Are
there other encodings I should add?
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, (continued)
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Müller , Andre, 2014/02/03
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Müller , Andre, 2014/02/03
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Ben Pfaff, 2014/02/04
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Müller , Andre, 2014/02/04
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Ben Pfaff, 2014/02/04
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Ben Pfaff, 2014/02/08
- Message not available
- Message not available
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Müller , Andre, 2014/02/10
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Ben Pfaff, 2014/02/10
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Ben Pfaff, 2014/02/16
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Müller , Andre, 2014/02/18
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names,
Ben Pfaff <=
- Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Müller , Andre, 2014/02/18
Re: PSPP-BUG: Failure to handle an antique SPSS file containing umlauts in variable names, Ben Pfaff, 2014/02/02