|
From: | Alan Mead |
Subject: | K-means cluster center order |
Date: | Sat, 30 May 2015 17:38:26 -0500 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
I've uploaded a patch (against quick-cluster.c in 0.8.4) that adds support for the /PRINT=CLUSTER subcommand for k-means clustering to show the cluster membership for each case: https://savannah.gnu.org/bugs/index.php?41019 But this patch has a remaining bug. The clusters centers are saved in some indirect fashion that I cannot understand. In the patch, I report the cluster number returned by kmeans_get_nearest_group() but these cluster numbers are systematically different from the reported cluster numbers. That is, the centers are stored internally in arbitrary order (as they are discovered, I'd guess) and for purposes of reporting, they are numbered. I cannot replicate that output numbering. For example, in the attached output, the centers were (10,10), (-10,-10), and (-10,10) and 20 cases were generated for each cluster. The CLUSTER command reports 1 = (-10.23, -10.01), 2 = (-10.19, 10.18) and 3=(10.27, 9.82) so the first 20 cases should be members of cluster 3, the next 20 from cluster 3 and the last 20 from cluster 2. But using the results from kmeans_get_nearest_group(), the clusters are reported as 1, then 3, then 2. I don't understand how I can fix this. I think I need to use kmeans->group_order which is a "gsl_permutation" but this is beyond my familiarity with C and GSL. It's also possible that kmeans_order_groups() (which is called at the beginning of quick_cluster_show_results()) is not working properly. Any advice? -Alan -- Alan D. Mead, Ph.D. President, Talent Algorithms Inc. science + technology = better workers +815.588.3846 (Office) +267.334.4143 (Mobile) http://www.alanmead.org Announcing the Journal of Computerized Adaptive Testing (JCAT), a peer-reviewed electronic journal designed to advance the science and practice of computerized adaptive testing: http://www.iacat.org/jcat
patch_for_cluster_print.patch
Description: Text document
qc.pdf
Description: Adobe PDF document
qc.sps
Description: application/spss-sps
qc1.data
Description: Text document
[Prev in Thread] | Current Thread | [Next in Thread] |