octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Least Absolute Deviation solution of a linear system


From: James R. Van Zandt
Subject: Least Absolute Deviation solution of a linear system
Date: Thu, 22 Jan 2009 21:17:49 -0500

Octave has a function ols which find the least squares solution of an
overdetermined linear system.  I propose the following function, which
finds the "least abolute deviation" solution - the one that minimizes
the sum of the absolute values of the deviations.  It is more work to
find (requiring solution of a linear program), but is more robust to
outliers.  See below for references.

The LAD solution is the analog of the median.  You will recall that
the median of an odd number of values is well defined, but for an even
number of values we normally add a condition so we get the center of
the interval between the centermost values.  This implementation of
the LAD does not have this kind of rule.  If there is more than one
solution that minimizes the sum of absolute values of the deviations,
it doesn't attempt to return the "middle" one.

            - Jim Van Zandt



## Copyright (C) 2008
##               James R. Van Zandt
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING.  If not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn {Function File} address@hidden,@var{R}]=} olad (@var{Y}, @var{A})
## @iftex
## @tex
## If $Y$ is a $t \times 1$ matrix, return the Least Absolute Deviation
## solution of an over-determined linear system
## $Y = A x + e$ such that $\sum |e|$ is minimized, where
## $A$ is a $t \times k$ matrix,
## $x$ is a $k \times 1$ matrix, and
## $e$ is a $t \times 1$ matrix.
##
## Each row of $Y$ and $A$ is an observation and each column of
## $A$ a variable.  
##
## If $Y$ is a $t \times p$ matrix, then return a $k \times p$ matrix
## where each column is the Least Absolute Deviation solution of the 
## linear system using the corresponding column of $Y$.
## @end tex
## @end iftex
## @ifinfo
## If @math{Y} is a @math{t} by 1 matrix, return the Least Absolute Deviation
## solution of an over-determined linear system
## @math{Y = A x + e} such that
## @math{sum(abs(e))} is minimized, where
## @math{A} is a @math{t} by @math{k} matrix,
## @math{x} is a @math{k} by @math{1} matrix, and
## @math{e} is a @math{t} by @math{1} matrix.
##
## Each row of @var{Y} and @var{A} is an observation and each column of
## @var{A} a variable.  
##
## If Y is a @var{t} by @var{p} matrix, then return a @var{k} by @var{p}
## matrix where each column is the Least Absolute Deviation solution of the 
## linear system using the corresponding column of @var{Y}.
## @end ifinfo
##
## The return variables are as follows:
##
## @table @var
## @item chi
## The LAD estimator for @var{x}.
##
## @item R
## The matrix of residuals, @address@hidden = @address@hidden@var{chi}}.
## @end table
##
## Note that the mldivide solution CHI=A\Y minimizes the sum of the squares
## of the errors, instead of their absolute values.  The LAD solution is
## more robust to outliers.  It is the natural analog of the sample median.
##
## References:
##
## @itemize @bullet
## @item M. DasGupta and S.K. Mishra, "Least Absolute Deviation
## Estimation of Linear Econometric Models: A Literature Review", 2004,
## http://ssrn.com/abstract=552502.
##
## @item G. Bassett Jr. and R. Koenker, "Asymptotic Theory of Least
## Absolute Error Regression", Journal of American Statistical
## Association, 73 (618-622), 1978.
##
## @item John D'Errico, "Optimization Tips and Tricks",
## @url{http://www.mathworks.com/matlabcentral/fileexchange/8553},
## 26 Sep 2005 (Updated 07 Dec 2006).
## @end itemize
## @end deftypefn

## Author: Jim Van Zandt <address@hidden>
## Created: December 2008
## Keywords: least squares, outliers, robust regression, breakdown point, least 
absolute deviation

function [CHI, R] = olad (Y, A)

  if (nargin != 2)
    print_usage ();
  endif

  [ra, ca] = size (A);
  [ry, cy] = size (Y);
  if (ra != ry)
    error ("olad: incorrect matrix dimensions");
  endif

  f=[zeros(ca,1); ones(2*ra,1)];
  Aeq=[A eye(ra) -eye(ra)];
  CHI=zeros(ca,cy);
  R=zeros(ry,cy);
  for j=1:cy
    p=glpk(f,Aeq,Y(:,j));
    CHI(:,j)=p(1:ca);
    R(:,j)=p(ca+1:ca+ry)-p(ca+ry+1:end);
  endfor
endfunction

%!test
%! assert(olad([1;2;3;4;5],ones(5,1)),3,1e-10); # =median in 1D if nrows odd
%! assert(olad([-99;2;3;4;5],ones(5,1)),3,1e-10); # outlier matters not
%! assert(olad([1;2;3;4;99],ones(5,1)),3,1e-10); # outlier matters not
%! assert(olad([1 2 3 4 5;2 4 6 8 10]',ones(5,1)),[3 6],1e-10);

%!test
%! Y = rand(11,1);
%! assert(olad(Y,ones(11,1)),median(Y),1e-10); # =median in 1D if nrows odd

%!test
%! A3=[sin(.5) cos(.5);
%!     sin(.6) cos(.6);
%!     sin(.7) cos(.7);
%!     sin(.8) cos(.8);
%!     sin(.9) cos(.9)];
%! x3=olad([3;3;3;3;3],A3); # in 2D: a point where 5 lines (nearly) cross
%! assert(x3,A3\[3;3;3;3;3],.02);       # without outliers, near OLS result 
%! assert(olad([-99;3;3;3;3],A3),x3,.3); # effect of an outlier is small
%! assert(olad([99;3;3;3;3],A3),x3,1e-10); # ...or negligible

%!error olad ();

%!error olad (1, 2, 3);

%!demo
%! x=sort(rand(100,1)); y=1+2*x+rand(size(x))-.5; 
%! n=length(x);
%! coef=olad(y,[ones(n,1) x]);             % least absolute deviation fit
%! t=[ones(n,1) x]\y;                      % least squares linear fit
%! 
%! x(end+1)=3.5; y(end+1)=1;
%! n=length(x);
%! coef2=olad(y,[ones(n,1) x]);            % least absolute deviation fit
%! t2=[ones(n,1) x]\y;                     % least squares linear fit
%! 
%! subplot(1,2,1);
%! plot(x,y,'o', ...
%!      x,t(1)+t(2)*x,';no outlier;', ...
%!      x,t2(1)+t2(2)*x,';with outlier;');
%! title('least squares solutions');
%! xlabel x
%! ylabel y
%! axis([0 4 0 4]);
%! subplot(1,2,2);
%! plot(x,y,'o', ...
%!      x,coef(1)+coef(2)*x,';no outliner;', ...
%!      x,coef2(1)+coef2(2)*x,';with outlier;');
%! title('Least Absolute Deviation solutions');
%! xlabel x
%! ylabel y
%! axis([0 4 0 4]);


diff -r 7d0492aa522d doc/interpreter/optim.txi
--- a/doc/interpreter/optim.txi Sat Dec 06 07:48:35 2008 +0100
+++ b/doc/interpreter/optim.txi Thu Jan 22 21:00:36 2009 -0500
@@ -21,14 +21,15 @@
 
 Octave comes with support for solving various kinds of optimization
 problems. Specifically Octave can solve problems in Linear Programming,
-Quadratic Programming, Nonlinear Programming, and Linear Least Squares
-Minimization.
+Quadratic Programming, Nonlinear Programming, Linear Least Squares
+Minimization, and Least Absolute Deviation solution of linear systems.
 
 @menu
 * Linear Programming::       
 * Quadratic Programming::       
 * Nonlinear Programming::       
 * Linear Least Squares::        
+* Least Absolute Deviation::        
 @end menu
 
 @c @cindex linear programming
@@ -140,3 +141,15 @@
 @DOCSTRING(lsqnonneg)
 
 @DOCSTRING(optimset)
+
address@hidden Least Absolute Deviation
address@hidden Least Absolute Deviation
+
+Least squares optimization is sensitive to outliers.  Even one outlying
+data point can have an arbitrarily large effect on the result.  If we
+minimize the sum of the absolute value of the error instead of its
+square, the result is much more robust against outliers.  Octave
+supports LAD optimization for overdetermined linear sytems.
+
address@hidden(olad)
+
diff -r 7d0492aa522d scripts/statistics/base/Makefile.in
--- a/scripts/statistics/base/Makefile.in       Sat Dec 06 07:48:35 2008 +0100
+++ b/scripts/statistics/base/Makefile.in       Thu Jan 22 21:01:57 2009 -0500
@@ -34,7 +34,7 @@
 
 SOURCES = __quantile__.m center.m cloglog.m cor.m corrcoef.m cov.m \
   cut.m gls.m iqr.m kendall.m kurtosis.m logit.m mahalanobis.m mean.m \
-  meansq.m median.m mode.m moment.m ols.m ppplot.m prctile.m probit.m \
+  meansq.m median.m mode.m moment.m olad.m ols.m ppplot.m prctile.m probit.m \
   qqplot.m quantile.m range.m ranks.m run_count.m skewness.m spearman.m \
   statistics.m std.m studentize.m table.m values.m var.m
 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]