[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
PSPP-BUG: [bug #55825] Feature Request: Improved Linear Regression Diagn
From: |
Matt |
Subject: |
PSPP-BUG: [bug #55825] Feature Request: Improved Linear Regression Diagnostics |
Date: |
Mon, 4 Mar 2019 09:27:25 -0500 (EST) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36 |
URL:
<https://savannah.gnu.org/bugs/?55825>
Summary: Feature Request: Improved Linear Regression
Diagnostics
Project: PSPP
Submitted by: mattakatdat
Submitted on: Mon 04 Mar 2019 02:27:23 PM UTC
Category: Other
Severity: 5 - Average
Status: None
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Release: None
Effort: 0.00
_______________________________________________________
Details:
PSPP is amazing, and I would love to use it in my college classrooms. However,
right now it lacks the features needed to actually conduct regression analyses
responsibly by checking your assumptions. There are dozens of these, of
course, but I think that three in particular would be very helpful to putting
PSPP on the map for introductory statistics courses looking for a free GUI
based teaching tool. I am not a programmer, but I feel like I should give
suggestions on what would be most useful for PSPP in teaching a "second class"
on regression
1) OLS Assumption #1: Negligible multicollinearity: The ability to estimate
Variance Inflation Factors would give a tool for testing the presence of
multicollinearity.
References: Thiel's Principles of Econometrics; basic computations can be
found here: https://newonlinecourses.science.psu.edu/stat501/node/347/
2) Assumption 2: Outliers are handled and non-influential- The ability to
identify outliers in multivariate models by calculating Cook's Distance (aka
Cook's D statistics) would help with finding outliers.
Reference: Cook, R. Dennis (March 1979). "Influential Observations in Linear
Regression". Journal of the American Statistical Association. American
Statistical Association. 74 (365): 169–174.
3) Assumption 3: Linearity: Component-plus-residual plots can visually
identify non-linear associations in many cases.
Overview:
https://www.stat.washington.edu/pds/stat423/Documents/LectureNotes/notes.423.ch12.pdf
4) Assumption 4: Heteroskedasticity: Residual vs fitted plots, which PSPP all
but supports already since it can output residuals and predicted values. It
would just be a matter of temporarily taking those values, standardizing them,
and scatter plotting the two.
There are, of course, other assumptions and tests, but this is a good start. I
am happy to test these features if they are implemented.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?55825>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- PSPP-BUG: [bug #55825] Feature Request: Improved Linear Regression Diagnostics,
Matt <=