[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-gsl] Multidimensional linear fit (and principal component anal
From: |
Philipp Klaus Krause |
Subject: |
Re: [Help-gsl] Multidimensional linear fit (and principal component analysis, covariance matrix) |
Date: |
Sun, 14 Dec 2008 14:16:07 +0100 |
User-agent: |
Mozilla-Thunderbird 2.0.0.17 (X11/20081018) |
Liam Healy schrieb:
> http://www.gnu.org/software/gsl/manual/html_node/Multi_002dparameter-fitting.html
>
> On Sat, Dec 13, 2008 at 3:53 PM, Philipp Klaus Krause <address@hidden> wrote:
>> I want to do a least squares fit of a line in 3 or 4-dimensional space
>> to 16 data points.
>> I looked at the manual, it seems gsl provides least squares linear fits
>> only for onedimensional stuff.
>
> ??
> http://www.gnu.org/software/gsl/manual/html_node/Multi_002dparameter-fitting.html
>
Somehow I can't see how I could use that for my problem. The y there is
still a vector of doubles, while I have points in three- or
fourdimensional space. There is that matrix X which allows fitting to
any number of functions, while I just want a line.
Thus I've tried implementing the PCA using gsl. It works fine, but
currently takes a bit more than 7 seconds to call pca() a million times
on my system. Since I'm a newbie to gsl I suppose there is a way to
improve performance. What would you suggest? Dimension will typically be
3 or 4.
struct odata
{
int dimension;
gsl_matrix *covariance_matrix;
gsl_eigen_symmv_workspace *workspace;
gsl_vector *eigenvalues;
gsl_matrix *eigenvectors;
gsl_vector *mean;
gsl_matrix *mean_substracted_points;
};
static void create_pca(struct odata *data, const int dimension)
{
#warning TODO: Check for lack of memory in this function.
data->dimension = dimension;
data->covariance_matrix = gsl_matrix_alloc(dimension, dimension);
data->eigenvalues = gsl_vector_alloc(dimension);
data->eigenvectors = gsl_matrix_alloc(dimension, dimension);
data->mean = gsl_vector_alloc(dimension);
data->mean_substracted_points = gsl_matrix_alloc(dimension, 16);
data->workspace = gsl_eigen_symmv_alloc(dimension);
}
static void destroy_pca(struct odata *data)
{
gsl_eigen_symmv_free(data->workspace);
gsl_matrix_free(data->mean_substracted_points);
gsl_vector_free(data->mean);
gsl_matrix_free(data->eigenvectors);
gsl_vector_free(data->eigenvalues);
gsl_matrix_free(data->covariance_matrix);
}
// Do PCA, principal component can be found in first column of
data->eigenvectors.
static void pca(const double *points, struct odata *data)
{
int i;
for(i = 0; i < data->dimension; i++)
gsl_vector_set(data->mean, i, gsl_stats_mean(points + i,
data->dimension, 16));
// Get mean-substracted data into matrix mean_substracted_points.
for(i = 0; i < 16; i++)
{
gsl_vector_const_view point_view =
gsl_vector_const_view_array(points
+ i * data->dimension, data->dimension);
gsl_vector_view mean_substracted_point_view =
gsl_matrix_column(data->mean_substracted_points, i);
gsl_vector_memcpy(&mean_substracted_point_view.vector,
&point_view.vector);
gsl_vector_sub(&mean_substracted_point_view.vector, data->mean);
}
// Compute Covariance matrix
gsl_blas_dgemm(CblasNoTrans, CblasTrans, 1.0 / 16.0,
data->mean_substracted_points, data->mean_substracted_points, 0.0,
data->covariance_matrix);
// Get eigenvectors, sort by eigenvalue.
gsl_eigen_symmv(data->covariance_matrix, data->eigenvalues,
data->eigenvectors, data->workspace);
gsl_eigen_symmv_sort(data->eigenvalues, data->eigenvectors,
GSL_EIGEN_SORT_ABS_DESC);
}
Philipp