Re: [bug #48040] GLM produces wrong output

pspp-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #48040] GLM produces wrong output

From:	John Darrington
Subject:	Re: [bug #48040] GLM produces wrong output
Date:	Sat, 18 Jun 2016 09:37:11 +0200
User-agent:	Mutt/1.5.23 (2014-03-12)

Hi Alan,

Can you run the syntaxs in the attached tarball through SPSS and 
post the results.

Thanks,

John

On Sun, May 29, 2016 at 03:43:36PM -0500, Alan Mead wrote:
     On 5/29/2016 1:22 AM, John Darrington wrote:
     > Thanks Alan,
     >
     > You are right - this is entirely due to missing values.  I'm somewhat 
relieved
     > that it is not something more fundamental.
     >
     > But the problem I see now is that SPSS does not document how it treats 
missings.
     
     I agree.  BTW, I used SPSS 23.  I may have said I used SPSS 24 but I
     just checked and it was 23.
     
     > Perhaps you could do some experiments.  For example, do missing values 
in the factor variables
     > get treated as a separate factor value or does the case get simply 
dropped?
     
     What further experiments do you propose?
     
     I removed the missing values (in PSPP) and saved the file using PSPP
     (the attached personality2.sav) , closed PSPP, reopened PSPP, opened the
     new file, and ran the below PSPP commands. 
     
     recode agree_score (0=SYSMIS) (else=copy) into A_withmissing.
     recode extra_score (0=SYSMIS) (else=copy) into E_withmissing.
     recode caution_score (0=SYSMIS) (else=copy) into C_withmissing.
     recode caution_score (lo thru 35=1) (36 thru hi=2) (else=SYSMIS) into
     FactorC.
     recode extra_score (lo thru 31=1) (32 thru hi=2) (else=SYSMIS) into 
FactorE.
     execute.
     
     FREQ / agree_score extra_score caution_score FactorC FactorE
     A_withmissing C_withmissing E_withmissing .
     
     * GLM output from below seems correct.
     GLM agree_score BY  FactorC FactorE.
     
     recode caution_score (0=SYSMIS) (1 thru 35=1) (36 thru hi=2)
     (else=SYSMIS) into FactorC_withmissing.
     recode extra_score (0=SYSMIS) (1 thru 31=1) (32 thru hi=2) (else=SYSMIS)
     into FactorE_withmissing.
     freq / FactorC_withmissing FactorE_withmissing .
     
     * GLM output from below seems WRONG.
     GLM agree_score BY  FactorC_withmissing FactorE_withmissing.
     
     * GLM output from below seems WRONG, but less blatantly; df is wrong for
     the factor with missing data.
     GLM agree_score BY  FactorC FactorE_withmissing.
     
     * GLM output from below seems correct.
     GLM A_withmissing BY  FactorC FactorE.
     
     * GLM output from below seems WRONG.
     GLM A_withmissing BY  FactorC_withmissing FactorE_withmissing.
     
     * GLM output from below seems WRONG.
     GLM A_withmissing BY  FactorC_withmissing FactorE.
     
     When we run the SPSS 23 SAV file through PSPP GLM with missing values in
     the dependent variable (only), we get weird results like negative SS. 
     That apparently doesn't happen when PSPP generates the missing data (for
     the dependent variable), suggesting that there are differences as you
     suggest between the way SPSS 23 creates a SAV file and how PSPP does. 
     It seems like reverse-engineering the SPSS files has been the kind of
     thing that Ben has looked into in the past?
     
     But there are still missing data issues that seem to have nothing to do
     with how the SAV file was created.  GLM may treat missing correctly in
     the dependent variable, but it appears not to do so for the independent
     variables and especially when both independent variables have missing
     data it seems to produce spectacularly bad output. 
     
     I didn't generate different kinds of missing data, but these missing
     values are almost all the same case for each variable.  The value of
     zero isn't a possible value for any of the Likert variables and
     represents missing data (probably that someone completed a small
     fragment of the full survey).  So I recoded zero into missing. There
     were 22 zeros for Agreeableness but only 21 for Extraversion and Caution
     (conscientiousness).  So, I think for 21 cases, all three variables were
     missing and one case was only missing agreeableness.  I'm sure there are
     many datasets where missing status is relatively uncorrelated. I didn't
     try to re-create such a file but you could easily do so by
     randomly/manually censoring the file.
     > And what about the dependent variables? If there are say 2 dependent 
variables and one
     > is missing  what happens then?  Is the case dropped for both anayses or 
just the one that is missing?
     
     Are you asking about the behavior of SPSS?  I believe SPSS offers
     listwise and pairwise deletion and that pairwise is the default.  So, if
     there were two dependent variables
     
     Or if you were asking about PSPP, I was just looking at glm.c and I got
     the impression that it cannot handle two dependent variables yet?
     
     -Alan
     
     -- 
     
     Alan D. Mead, Ph.D.
     President, Talent Algorithms Inc.
     
     science + technology = better workers
     
     +815.588.3846 (Office)
     +267.334.4143 (Mobile)
     
     http://www.alanmead.org
     
     I've... seen things you people wouldn't believe...
     functions on fire in a copy of Orion.
     I watched C-Sharp glitter in the dark near a programmable gate.
     All those moments will be lost in time, like Ruby... on... Rails... Time 
for Pi.
     
               --"The Register" user Alister, applying the famous 
                 "Blade Runner" speech to software development
     



-- 
Avoid eavesdropping.  Send strong encryted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.

glm-experiments.tar.gz
Description: application/gzip

signature.asc
Description: Digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [bug #48040] GLM produces wrong output, John Darrington, 2016/06/03
- Re: [bug #48040] GLM produces wrong output, John Darrington <=
  - Re: [bug #48040] GLM produces wrong output, Alan Mead, 2016/06/20
    - Re: [bug #48040] GLM produces wrong output, John Darrington, 2016/06/21

Prev by Date: Re: About application distribution in GNU/Linux
Next by Date: Re: [bug #48040] GLM produces wrong output
Previous by thread: Re: [bug #48040] GLM produces wrong output
Next by thread: Re: [bug #48040] GLM produces wrong output
Index(es):
- Date
- Thread