pspp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regression, and missing data


From: John Darrington
Subject: Re: regression, and missing data
Date: Tue, 6 Mar 2012 11:46:27 +0000
User-agent: Mutt/1.5.18 (2008-05-17)

I would be very grateful if you would open a new bug report at
http://savannah.gnu.org/bugs/?group=pspp and copy in the information below 
(and any other which you think is relevant).

It sounds as if this is a rather serious problem, so please mark it as 
Severity: Major

Thanks reporting this problem.

John

On Tue, Mar 06, 2012 at 12:06:44AM -0500, Renan Levine wrote:
> Hi-
>
> This appears to be a bug in the PSPP regression routine with data with a  
> large amount of missing values!
>
> I recently noticed some small discrepancies between simple bivariate  
> regression results between IBM SPSS, STATA and PSPP. Until Prof.  
> Shackman's email, I hadn't realized that the discrepancies only occur  
> when there are many missing values. I was just confused...
>
> Sadly, I also find problems when running linear regressions using PSPP  
> on data with missing values. I wish I knew what was causing the problem.
>
> So, using Dropbox, I wanted to make available some data which seems to  
> illustrate the issue.
>
> Using psppire.exe 0.7.9-gab8ce2 on Windows AND psppire 0.7.8 on  
> LinuxMint LXDE, PSPP calculates descriptive statistics just like SPSS  
> and STATA on the same dataset, but does not calculate identical b  
> coefficients when running bivariate or multivariate regressions.
>
> I created the following public opinion survey data files consisting of  
> three variables from the 2004 Canadian Election Study which I recoded  
> and declared certain values to be missing:
> http://dl.dropbox.com/u/35198072/ces2004-regtest.sav  
> <http://www.queensu.ca/cora/ces.html> has many observations with missing  
> values.
> http://dl.dropbox.com/u/35198072/ces2004-regtest2.sav has the same three  
> variables, but I dropped all of the cases with missing values.
>
> This is the syntax file used to run descriptive statistics and three  
> regression analyses.
> http://dl.dropbox.com/u/35198072/regression-tests.sps
>
> PSPP generates these regression results and descriptive statistics with  
> missing values:
> http://dl.dropbox.com/u/35198072/regression-test-pspp1.html
> PSPP generates these regression results and descriptive statistics using  
> the data without any missing values:
> http://dl.dropbox.com/u/35198072/regression-test-pspp2.html
>
> Here is the STATA output on the same output (.log is a text file - email  
> me if you have a problem opening it). The first three regressions should  
> match the output in regression-test-pspp1.html
> They are close, but not close enough... The bottom three regressions use  
> the data with no missing values and these DO match PSPP's output (in  
> regression-test-pspp2.html).
> http://dl.dropbox.com/u/35198072/regression-test-stata.log
>
> I also ran the data on SPSS and found results consistent with STATA.  
> There did not seem to be any problems with Pearson's Chi-Square or  
> Kendall's Tau-B when running a crosstab on the data with the missing 
> values.
>
> I am sorry I don't know what has gone wrong, so I am making available  
> this data in hopes someone might figure out where there is a mistake.  I  
> caution other users running regression on PSPP.
>
> Yours,
> Renan
>
> On 04-Mar-12 11:37 PM, Gene Shackman wrote:
>> Hi
>>
>> I'm using the windows version, psppire.exe 0.7.8-g997322, that I  
>> downloaded from
>> http://www.gnu.org/software/pspp/get.html
>> I'm using windows vista, home version.
>>
>> My question is about linear regression. If I use data that has no  
>> missing values, then PSPP regression seems to work fine. I compared  
>> the results with other packages and got the same results, see  
>> http://gsociology.icaap.org/methods/comparing_freestaprograms.html
>> However, if I use data that does have missing values, I get results  
>> that are different from other programs. See the results from other  
>> programs here
>> http://gsociology.icaap.org/methods/comparing_freestaprograms_missing.html
>> this also lists the data set I'm using, and attached below are the  
>> results I get from PSPP  (If you format this as courier, it aligns up  
>> right.)
>>
>> So 2 questions
>> 1. How does pspp deal with missing? By the way, I tried coding blanks  
>> as missing and also tried replacing all the missing values with -99999  
>> and told pspp those were missing values, and got exactly the same 
>> results.
>>
>> 2. There don't appear to be any options on how regression is done,  
>> like forward, backward, forced, etc. I didn't see anything in the  
>> documentation about it either. Is it just doing straight forced  
>> regression? Will there be any options on how to do regression?
>>
>> Thanks very much.
>>
>> REGRESSION
>>     /VARIABLES= c_arable climate North phone_kpop
>>     /DEPENDENT=     gini
>>     /STATISTICS=COEFF R ANOVA.
>>
>> Model Summary
>> #====#========#=================#==========================#
>> #  R #R Square|Adjusted R Square|Std. Error of the Estimate#
>> ##===#========#=================#==========================#
>> #|.60#     .36|              .35|                      8.65#
>> ##===#========#=================#==========================#
>>
>> ANOVA
>> #===========#==============#===#===========#=====#============#
>> #           #Sum of Squares| df|Mean Square|  F  |Significance#
>> ##==========#==============#===#===========#=====#============#
>> #|Regression#       4548.35|  4|    1137.09|15.19|         .00#
>> #|Residual  #       7933.89|106|      74.85|     |            #
>> #|Total     #      12482.24|110|           |     |            #
>> ##==========#==============#===#===========#=====#============#
>>
>> Coefficients
>> #===========#=====#==========#====#=====#============#
>> #           #  B  |Std. Error|Beta|  t  |Significance#
>> ##==========#=====#==========#====#=====#============#
>> #|(Constant)#47.95|      2.06| .00|23.22|         .00#
>> #| c_arable # -.12|       .05|-.20|-2.28|         .02#
>> #|  climate #-1.24|      1.04|-.11|-1.20|         .23#
>> #|   North  # -.14|       .03|-.43|-4.96|         .00#
>> #|phone_kpop#  .00|       .00|-.07| -.81|         .42#
>> ##==========#=====#==========#====#=====#============#
>>
>>
>>
>>
>> Gene
>>
>>
>>
>> Gene Shackman, Ph.D.
>> The Global Social Change Research Project
>> http://gsociology.icaap.org
>> Free Resources for Methods in Evaluation and Social Research
>> http://gsociology.icaap.org/methods
>> ----------
>> Applied Sociologist
>> ----------
>>
>>
>>
>> _______________________________________________
>> Pspp-users mailing list
>> address@hidden
>> https://lists.gnu.org/mailman/listinfo/pspp-users
>
> -- 
> Renan Levine
> Department of Political Science
> University of Toronto - Scarborough
> address@hidden
> http://individual.utoronto.ca/renan
> (416) 208-2651
>

     _______________________________________________
     Pspp-users mailing list
     address@hidden
     https://lists.gnu.org/mailman/listinfo/pspp-users


-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://keys.gnupg.net or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]