[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Boxplot whisker length
From: |
John Darrington |
Subject: |
Re: Boxplot whisker length |
Date: |
Sat, 3 Jan 2015 08:29:26 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Wed, Dec 31, 2014 at 10:28:10AM +0100, John Darrington wrote:
On Tue, Dec 30, 2014 at 04:58:48PM -0600, Alan Mead wrote:
or GNU/Linux.
Regarding the actual algorithm, the boxplot I get from SPSS is
attached
as "boxplot2.png". I think it's a lot more reasonable (albeit
uglier).
The main difference is the SPSS boxplot had short whiskers while
PSPP's
boxplot whiskers seems to include the entire range of the data
(including the outlier). In the physio dataset, apparently there are
some outliers like 30 mm for a human height. That's the kind of thing
that boxplots are supposed to help you find. Maybe that's a bug in
PSPP
that the whisker length is just wrong? Otherwise I think it would
make
more sense to limit the whiskers to some reasonable value like 1.5
times
the inter-quartile range (or to the highest and lowest values that are
within 1.5 times the inter-quartile range).
Here is what SPSS has to say about boxplots:
The boundaries of the box are Tukey's hinges. The length of the box is
the interquartile range
based on Tukey's hinges. That is, IQR = Q_3 - Q_1
Define
STEP = 1.5 IQR
A case is an outlier if
Q_3 + STEP < y < Q_3 + 2 * STEP
or
Q_3 - 2 * STEP < y < Q_3 - 2 * STEP
A case is an extreme if
y >= Q_3 + 2 * STEP
or
y <= Q_1 - 2 * STEP
Note that it doesn't actually say where the whiskers should be. However
it seems that PSPP
is placing the lower whisker at the lowest value y, of the dataset for
which
y < Q1 - STEP
and the upper whisker at the highest value y, for which
y < Q3 + STEP
I vaguely remember reading this recommendation in the literature.
If someone can reference any better recommendations, when we can consider
implementing that instead.
Most other implementations seem to have the whiskers extend to the most extreme
points of the dataset, which are not themselves outliers.
So I pushed a change so that boxplots in PSPP do that too.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: Boxplot whisker length,
John Darrington <=