pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #48040] GLM produces wrong output


From: Alan Mead
Subject: Re: [bug #48040] GLM produces wrong output
Date: Mon, 20 Jun 2016 17:09:48 -0500
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1

John,

Sorry for the delay.  Attached are the outputs in TEXT and PDF formats.

-Alan

On 6/18/2016 2:37 AM, John Darrington wrote:
> Hi Alan,
>
> Can you run the syntaxs in the attached tarball through SPSS and 
> post the results.
>
> Thanks,
>
> John
>
> On Sun, May 29, 2016 at 03:43:36PM -0500, Alan Mead wrote:
>      On 5/29/2016 1:22 AM, John Darrington wrote:
>      > Thanks Alan,
>      >
>      > You are right - this is entirely due to missing values.  I'm somewhat 
> relieved
>      > that it is not something more fundamental.
>      >
>      > But the problem I see now is that SPSS does not document how it treats 
> missings.
>      
>      I agree.  BTW, I used SPSS 23.  I may have said I used SPSS 24 but I
>      just checked and it was 23.
>      
>      > Perhaps you could do some experiments.  For example, do missing values 
> in the factor variables
>      > get treated as a separate factor value or does the case get simply 
> dropped?
>      
>      What further experiments do you propose?
>      
>      I removed the missing values (in PSPP) and saved the file using PSPP
>      (the attached personality2.sav) , closed PSPP, reopened PSPP, opened the
>      new file, and ran the below PSPP commands. 
>      
>      recode agree_score (0=SYSMIS) (else=copy) into A_withmissing.
>      recode extra_score (0=SYSMIS) (else=copy) into E_withmissing.
>      recode caution_score (0=SYSMIS) (else=copy) into C_withmissing.
>      recode caution_score (lo thru 35=1) (36 thru hi=2) (else=SYSMIS) into
>      FactorC.
>      recode extra_score (lo thru 31=1) (32 thru hi=2) (else=SYSMIS) into 
> FactorE.
>      execute.
>      
>      FREQ / agree_score extra_score caution_score FactorC FactorE
>      A_withmissing C_withmissing E_withmissing .
>      
>      * GLM output from below seems correct.
>      GLM agree_score BY  FactorC FactorE.
>      
>      recode caution_score (0=SYSMIS) (1 thru 35=1) (36 thru hi=2)
>      (else=SYSMIS) into FactorC_withmissing.
>      recode extra_score (0=SYSMIS) (1 thru 31=1) (32 thru hi=2) (else=SYSMIS)
>      into FactorE_withmissing.
>      freq / FactorC_withmissing FactorE_withmissing .
>      
>      * GLM output from below seems WRONG.
>      GLM agree_score BY  FactorC_withmissing FactorE_withmissing.
>      
>      * GLM output from below seems WRONG, but less blatantly; df is wrong for
>      the factor with missing data.
>      GLM agree_score BY  FactorC FactorE_withmissing.
>      
>      * GLM output from below seems correct.
>      GLM A_withmissing BY  FactorC FactorE.
>      
>      * GLM output from below seems WRONG.
>      GLM A_withmissing BY  FactorC_withmissing FactorE_withmissing.
>      
>      * GLM output from below seems WRONG.
>      GLM A_withmissing BY  FactorC_withmissing FactorE.
>      
>      When we run the SPSS 23 SAV file through PSPP GLM with missing values in
>      the dependent variable (only), we get weird results like negative SS. 
>      That apparently doesn't happen when PSPP generates the missing data (for
>      the dependent variable), suggesting that there are differences as you
>      suggest between the way SPSS 23 creates a SAV file and how PSPP does. 
>      It seems like reverse-engineering the SPSS files has been the kind of
>      thing that Ben has looked into in the past?
>      
>      But there are still missing data issues that seem to have nothing to do
>      with how the SAV file was created.  GLM may treat missing correctly in
>      the dependent variable, but it appears not to do so for the independent
>      variables and especially when both independent variables have missing
>      data it seems to produce spectacularly bad output. 
>      
>      I didn't generate different kinds of missing data, but these missing
>      values are almost all the same case for each variable.  The value of
>      zero isn't a possible value for any of the Likert variables and
>      represents missing data (probably that someone completed a small
>      fragment of the full survey).  So I recoded zero into missing. There
>      were 22 zeros for Agreeableness but only 21 for Extraversion and Caution
>      (conscientiousness).  So, I think for 21 cases, all three variables were
>      missing and one case was only missing agreeableness.  I'm sure there are
>      many datasets where missing status is relatively uncorrelated. I didn't
>      try to re-create such a file but you could easily do so by
>      randomly/manually censoring the file.
>      > And what about the dependent variables? If there are say 2 dependent 
> variables and one
>      > is missing  what happens then?  Is the case dropped for both anayses 
> or just the one that is missing?
>      
>      Are you asking about the behavior of SPSS?  I believe SPSS offers
>      listwise and pairwise deletion and that pairwise is the default.  So, if
>      there were two dependent variables
>      
>      Or if you were asking about PSPP, I was just looking at glm.c and I got
>      the impression that it cannot handle two dependent variables yet?
>      
>      -Alan
>      
>      -- 
>      
>      Alan D. Mead, Ph.D.
>      President, Talent Algorithms Inc.
>      
>      science + technology = better workers
>      
>      +815.588.3846 (Office)
>      +267.334.4143 (Mobile)
>      
>      http://www.alanmead.org
>      
>      I've... seen things you people wouldn't believe...
>      functions on fire in a copy of Orion.
>      I watched C-Sharp glitter in the dark near a programmable gate.
>      All those moments will be lost in time, like Ruby... on... Rails... Time 
> for Pi.
>      
>                --"The Register" user Alister, applying the famous 
>                  "Blade Runner" speech to software development
>      
>
>
>

-- 

Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.

science + technology = better workers

+815.588.3846 (Office)
+267.334.4143 (Mobile)

http://www.alanmead.org

I've... seen things you people wouldn't believe...
functions on fire in a copy of Orion.
I watched C-Sharp glitter in the dark near a programmable gate.
All those moments will be lost in time, like Ruby... on... Rails... Time for Pi.

          --"The Register" user Alister, applying the famous 
            "Blade Runner" speech to software development

Attachment: glm_experiments-output.pdf
Description: Adobe PDF document

Attachment: glm_experiments-output.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]