pspp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regression, and missing data


From: Renan Levine
Subject: Re: regression, and missing data
Date: Tue, 06 Mar 2012 00:06:44 -0500
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2

Hi-

This appears to be a bug in the PSPP regression routine with data with a large amount of missing values!

I recently noticed some small discrepancies between simple bivariate regression results between IBM SPSS, STATA and PSPP. Until Prof. Shackman's email, I hadn't realized that the discrepancies only occur when there are many missing values. I was just confused...

Sadly, I also find problems when running linear regressions using PSPP on data with missing values. I wish I knew what was causing the problem.

So, using Dropbox, I wanted to make available some data which seems to illustrate the issue.

Using psppire.exe 0.7.9-gab8ce2 on Windows AND psppire 0.7.8 on LinuxMint LXDE, PSPP calculates descriptive statistics just like SPSS and STATA on the same dataset, but does not calculate identical b coefficients when running bivariate or multivariate regressions.

I created the following public opinion survey data files consisting of three variables from the 2004 Canadian Election Study which I recoded and declared certain values to be missing:
http://dl.dropbox.com/u/35198072/ces2004-regtest.sav has many observations with missing values.
http://dl.dropbox.com/u/35198072/ces2004-regtest2.sav has the same three variables, but I dropped all of the cases with missing values.

This is the syntax file used to run descriptive statistics and three regression analyses.
http://dl.dropbox.com/u/35198072/regression-tests.sps

PSPP generates these regression results and descriptive statistics with missing values:
http://dl.dropbox.com/u/35198072/regression-test-pspp1.html
PSPP generates these regression results and descriptive statistics using the data without any missing values:
http://dl.dropbox.com/u/35198072/regression-test-pspp2.html

Here is the STATA output on the same output (.log is a text file - email me if you have a problem opening it). The first three regressions should match the output in regression-test-pspp1.html
They are close, but not close enough... The bottom three regressions use the data with no missing values and these DO match PSPP's output (in regression-test-pspp2.html).
http://dl.dropbox.com/u/35198072/regression-test-stata.log

I also ran the data on SPSS and found results consistent with STATA. There did not seem to be any problems with Pearson's Chi-Square or Kendall's Tau-B when running a crosstab on the data with the missing values.

I am sorry I don't know what has gone wrong, so I am making available this data in hopes someone might figure out where there is a mistake.  I caution other users running regression on PSPP.

Yours,
Renan

On 04-Mar-12 11:37 PM, Gene Shackman wrote:
Hi

I'm using the windows version, psppire.exe 0.7.8-g997322, that I downloaded from
I'm using windows vista, home version.

My question is about linear regression. If I use data that has no missing values, then PSPP regression seems to work fine. I compared the results with other packages and got the same results, see  http://gsociology.icaap.org/methods/comparing_freestaprograms.html
However, if I use data that does have missing values, I get results that are different from other programs. See the results from other programs here
http://gsociology.icaap.org/methods/comparing_freestaprograms_missing.html
this also lists the data set I'm using, and attached below are the results I get from PSPP  (If you format this as courier, it aligns up right.)

So 2 questions
1. How does pspp deal with missing? By the way, I tried coding blanks as missing and also tried replacing all the missing values with -99999 and told pspp those were missing values, and got exactly the same results.

2. There don't appear to be any options on how regression is done, like forward, backward, forced, etc. I didn't see anything in the documentation about it either. Is it just doing straight forced regression? Will there be any options on how to do regression?

Thanks very much.

REGRESSION
    /VARIABLES= c_arable climate North phone_kpop
    /DEPENDENT=     gini
    /STATISTICS=COEFF R ANOVA.

Model Summary
#====#========#=================#==========================#
#  R #R Square|Adjusted R Square|Std. Error of the Estimate#
##===#========#=================#==========================#
#|.60#     .36|              .35|                      8.65#
##===#========#=================#==========================#

ANOVA
#===========#==============#===#===========#=====#============#
#           #Sum of Squares| df|Mean Square|  F  |Significance#
##==========#==============#===#===========#=====#============#
#|Regression#       4548.35|  4|    1137.09|15.19|         .00#
#|Residual  #       7933.89|106|      74.85|     |            #
#|Total     #      12482.24|110|           |     |            #
##==========#==============#===#===========#=====#============#

Coefficients
#===========#=====#==========#====#=====#============#
#           #  B  |Std. Error|Beta|  t  |Significance#
##==========#=====#==========#====#=====#============#
#|(Constant)#47.95|      2.06| .00|23.22|         .00#
#| c_arable # -.12|       .05|-.20|-2.28|         .02#
#|  climate #-1.24|      1.04|-.11|-1.20|         .23#
#|   North  # -.14|       .03|-.43|-4.96|         .00#
#|phone_kpop#  .00|       .00|-.07| -.81|         .42#
##==========#=====#==========#====#=====#============#


 

Gene



Gene Shackman, Ph.D.
The Global Social Change Research Project
http://gsociology.icaap.org
Free Resources for Methods in Evaluation and Social Research
http://gsociology.icaap.org/methods
----------
Applied Sociologist
----------



_______________________________________________
Pspp-users mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/pspp-users

-- 
Renan Levine
Department of Political Science
University of Toronto - Scarborough
address@hidden
http://individual.utoronto.ca/renan
(416) 208-2651

reply via email to

[Prev in Thread] Current Thread [Next in Thread]