[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Are these bugs in cluster?
From: |
Ben Pfaff |
Subject: |
Re: Are these bugs in cluster? |
Date: |
Sat, 30 May 2015 16:24:47 -0700 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Sat, May 30, 2015 at 09:13:16AM +0200, John Darrington wrote:
> On Fri, May 29, 2015 at 09:33:30AM -0500, Alan Mead wrote:
> John suggested that I post to pspp-dev. I'm adding code to the k-means
> (i.e., quick-cluster.c) procedure to show cluster membership.
>
> CLUSTER works perfectly on a trivial two-dimensional problem but it
> fails miserably on some real data. For example, in one analysis
> requesting 3 clusters on 98 cases, it found that everyone was in cluster
> 3 and zero people were in clusters 1 & 2. I think part of it is that
> the starting values seem to be a pattern of 1's and zero's, even though
> the comments describe selecting random individuals as starting values.
>
> My question is about accessing the data. I copied other code to use a
> "casereader" to iterate over the rows of data. Below are the relevant
> parts of the code I've added that seems to display cluster membership.
> If I want to randomly select cases as starting values, is there a way to
> retrieve random records directly?
>
>
> Ben is the casereader expert! Maybe he can comment? But I think you might
> be able to use the function casereader_select (defined in casereader-select.c)
>
> casereader_select (subreader, random_number - 1, random_number + 1, 1);
>
> You would have to ensure that random_number was within the range of subreader.
That seems reasonable to me.
If clustering actually wants a shuffled version of the complete data set
(I don't know if that is true?) then probably more efficient algorithms
are available.