[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug #48040] GLM produces wrong output
From: |
Alan Mead |
Subject: |
Re: [bug #48040] GLM produces wrong output |
Date: |
Mon, 20 Jun 2016 17:09:48 -0500 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 |
John,
Sorry for the delay. Attached are the outputs in TEXT and PDF formats.
-Alan
On 6/18/2016 2:37 AM, John Darrington wrote:
> Hi Alan,
>
> Can you run the syntaxs in the attached tarball through SPSS and
> post the results.
>
> Thanks,
>
> John
>
> On Sun, May 29, 2016 at 03:43:36PM -0500, Alan Mead wrote:
> On 5/29/2016 1:22 AM, John Darrington wrote:
> > Thanks Alan,
> >
> > You are right - this is entirely due to missing values. I'm somewhat
> relieved
> > that it is not something more fundamental.
> >
> > But the problem I see now is that SPSS does not document how it treats
> missings.
>
> I agree. BTW, I used SPSS 23. I may have said I used SPSS 24 but I
> just checked and it was 23.
>
> > Perhaps you could do some experiments. For example, do missing values
> in the factor variables
> > get treated as a separate factor value or does the case get simply
> dropped?
>
> What further experiments do you propose?
>
> I removed the missing values (in PSPP) and saved the file using PSPP
> (the attached personality2.sav) , closed PSPP, reopened PSPP, opened the
> new file, and ran the below PSPP commands.
>
> recode agree_score (0=SYSMIS) (else=copy) into A_withmissing.
> recode extra_score (0=SYSMIS) (else=copy) into E_withmissing.
> recode caution_score (0=SYSMIS) (else=copy) into C_withmissing.
> recode caution_score (lo thru 35=1) (36 thru hi=2) (else=SYSMIS) into
> FactorC.
> recode extra_score (lo thru 31=1) (32 thru hi=2) (else=SYSMIS) into
> FactorE.
> execute.
>
> FREQ / agree_score extra_score caution_score FactorC FactorE
> A_withmissing C_withmissing E_withmissing .
>
> * GLM output from below seems correct.
> GLM agree_score BY FactorC FactorE.
>
> recode caution_score (0=SYSMIS) (1 thru 35=1) (36 thru hi=2)
> (else=SYSMIS) into FactorC_withmissing.
> recode extra_score (0=SYSMIS) (1 thru 31=1) (32 thru hi=2) (else=SYSMIS)
> into FactorE_withmissing.
> freq / FactorC_withmissing FactorE_withmissing .
>
> * GLM output from below seems WRONG.
> GLM agree_score BY FactorC_withmissing FactorE_withmissing.
>
> * GLM output from below seems WRONG, but less blatantly; df is wrong for
> the factor with missing data.
> GLM agree_score BY FactorC FactorE_withmissing.
>
> * GLM output from below seems correct.
> GLM A_withmissing BY FactorC FactorE.
>
> * GLM output from below seems WRONG.
> GLM A_withmissing BY FactorC_withmissing FactorE_withmissing.
>
> * GLM output from below seems WRONG.
> GLM A_withmissing BY FactorC_withmissing FactorE.
>
> When we run the SPSS 23 SAV file through PSPP GLM with missing values in
> the dependent variable (only), we get weird results like negative SS.
> That apparently doesn't happen when PSPP generates the missing data (for
> the dependent variable), suggesting that there are differences as you
> suggest between the way SPSS 23 creates a SAV file and how PSPP does.
> It seems like reverse-engineering the SPSS files has been the kind of
> thing that Ben has looked into in the past?
>
> But there are still missing data issues that seem to have nothing to do
> with how the SAV file was created. GLM may treat missing correctly in
> the dependent variable, but it appears not to do so for the independent
> variables and especially when both independent variables have missing
> data it seems to produce spectacularly bad output.
>
> I didn't generate different kinds of missing data, but these missing
> values are almost all the same case for each variable. The value of
> zero isn't a possible value for any of the Likert variables and
> represents missing data (probably that someone completed a small
> fragment of the full survey). So I recoded zero into missing. There
> were 22 zeros for Agreeableness but only 21 for Extraversion and Caution
> (conscientiousness). So, I think for 21 cases, all three variables were
> missing and one case was only missing agreeableness. I'm sure there are
> many datasets where missing status is relatively uncorrelated. I didn't
> try to re-create such a file but you could easily do so by
> randomly/manually censoring the file.
> > And what about the dependent variables? If there are say 2 dependent
> variables and one
> > is missing what happens then? Is the case dropped for both anayses
> or just the one that is missing?
>
> Are you asking about the behavior of SPSS? I believe SPSS offers
> listwise and pairwise deletion and that pairwise is the default. So, if
> there were two dependent variables
>
> Or if you were asking about PSPP, I was just looking at glm.c and I got
> the impression that it cannot handle two dependent variables yet?
>
> -Alan
>
> --
>
> Alan D. Mead, Ph.D.
> President, Talent Algorithms Inc.
>
> science + technology = better workers
>
> +815.588.3846 (Office)
> +267.334.4143 (Mobile)
>
> http://www.alanmead.org
>
> I've... seen things you people wouldn't believe...
> functions on fire in a copy of Orion.
> I watched C-Sharp glitter in the dark near a programmable gate.
> All those moments will be lost in time, like Ruby... on... Rails... Time
> for Pi.
>
> --"The Register" user Alister, applying the famous
> "Blade Runner" speech to software development
>
>
>
>
--
Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.
science + technology = better workers
+815.588.3846 (Office)
+267.334.4143 (Mobile)
http://www.alanmead.org
I've... seen things you people wouldn't believe...
functions on fire in a copy of Orion.
I watched C-Sharp glitter in the dark near a programmable gate.
All those moments will be lost in time, like Ruby... on... Rails... Time for Pi.
--"The Register" user Alister, applying the famous
"Blade Runner" speech to software development
glm_experiments-output.pdf
Description: Adobe PDF document
glm_experiments-output.txt
Description: Text document