[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug #48040] GLM produces wrong output
From: |
John Darrington |
Subject: |
Re: [bug #48040] GLM produces wrong output |
Date: |
Sat, 18 Jun 2016 09:37:11 +0200 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
Hi Alan,
Can you run the syntaxs in the attached tarball through SPSS and
post the results.
Thanks,
John
On Sun, May 29, 2016 at 03:43:36PM -0500, Alan Mead wrote:
On 5/29/2016 1:22 AM, John Darrington wrote:
> Thanks Alan,
>
> You are right - this is entirely due to missing values. I'm somewhat
relieved
> that it is not something more fundamental.
>
> But the problem I see now is that SPSS does not document how it treats
missings.
I agree. BTW, I used SPSS 23. I may have said I used SPSS 24 but I
just checked and it was 23.
> Perhaps you could do some experiments. For example, do missing values
in the factor variables
> get treated as a separate factor value or does the case get simply
dropped?
What further experiments do you propose?
I removed the missing values (in PSPP) and saved the file using PSPP
(the attached personality2.sav) , closed PSPP, reopened PSPP, opened the
new file, and ran the below PSPP commands.
recode agree_score (0=SYSMIS) (else=copy) into A_withmissing.
recode extra_score (0=SYSMIS) (else=copy) into E_withmissing.
recode caution_score (0=SYSMIS) (else=copy) into C_withmissing.
recode caution_score (lo thru 35=1) (36 thru hi=2) (else=SYSMIS) into
FactorC.
recode extra_score (lo thru 31=1) (32 thru hi=2) (else=SYSMIS) into
FactorE.
execute.
FREQ / agree_score extra_score caution_score FactorC FactorE
A_withmissing C_withmissing E_withmissing .
* GLM output from below seems correct.
GLM agree_score BY FactorC FactorE.
recode caution_score (0=SYSMIS) (1 thru 35=1) (36 thru hi=2)
(else=SYSMIS) into FactorC_withmissing.
recode extra_score (0=SYSMIS) (1 thru 31=1) (32 thru hi=2) (else=SYSMIS)
into FactorE_withmissing.
freq / FactorC_withmissing FactorE_withmissing .
* GLM output from below seems WRONG.
GLM agree_score BY FactorC_withmissing FactorE_withmissing.
* GLM output from below seems WRONG, but less blatantly; df is wrong for
the factor with missing data.
GLM agree_score BY FactorC FactorE_withmissing.
* GLM output from below seems correct.
GLM A_withmissing BY FactorC FactorE.
* GLM output from below seems WRONG.
GLM A_withmissing BY FactorC_withmissing FactorE_withmissing.
* GLM output from below seems WRONG.
GLM A_withmissing BY FactorC_withmissing FactorE.
When we run the SPSS 23 SAV file through PSPP GLM with missing values in
the dependent variable (only), we get weird results like negative SS.
That apparently doesn't happen when PSPP generates the missing data (for
the dependent variable), suggesting that there are differences as you
suggest between the way SPSS 23 creates a SAV file and how PSPP does.
It seems like reverse-engineering the SPSS files has been the kind of
thing that Ben has looked into in the past?
But there are still missing data issues that seem to have nothing to do
with how the SAV file was created. GLM may treat missing correctly in
the dependent variable, but it appears not to do so for the independent
variables and especially when both independent variables have missing
data it seems to produce spectacularly bad output.
I didn't generate different kinds of missing data, but these missing
values are almost all the same case for each variable. The value of
zero isn't a possible value for any of the Likert variables and
represents missing data (probably that someone completed a small
fragment of the full survey). So I recoded zero into missing. There
were 22 zeros for Agreeableness but only 21 for Extraversion and Caution
(conscientiousness). So, I think for 21 cases, all three variables were
missing and one case was only missing agreeableness. I'm sure there are
many datasets where missing status is relatively uncorrelated. I didn't
try to re-create such a file but you could easily do so by
randomly/manually censoring the file.
> And what about the dependent variables? If there are say 2 dependent
variables and one
> is missing what happens then? Is the case dropped for both anayses or
just the one that is missing?
Are you asking about the behavior of SPSS? I believe SPSS offers
listwise and pairwise deletion and that pairwise is the default. So, if
there were two dependent variables
Or if you were asking about PSPP, I was just looking at glm.c and I got
the impression that it cannot handle two dependent variables yet?
-Alan
--
Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.
science + technology = better workers
+815.588.3846 (Office)
+267.334.4143 (Mobile)
http://www.alanmead.org
I've... seen things you people wouldn't believe...
functions on fire in a copy of Orion.
I watched C-Sharp glitter in the dark near a programmable gate.
All those moments will be lost in time, like Ruby... on... Rails... Time
for Pi.
--"The Register" user Alister, applying the famous
"Blade Runner" speech to software development
--
Avoid eavesdropping. Send strong encryted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
glm-experiments.tar.gz
Description: application/gzip
signature.asc
Description: Digital signature