Cluster Analysis

From Gaskination\'s StatWiki
Jump to navigation Jump to search

I've created several videos on cluster analysis, although I'm definitely not an expert on cluster analysis.

A note on selecting the right number of clusters:

In their book "Multivariate data analysis" Joseph Hair et al (2010) state that "no standard objective selection procedure exists" (p. 514) for deciding how many clusters should be extracted. Again on page 516: "No single objective procedure is available to determine the correct number of clusters; rather the researcher must evaluate alternative cluster solutions on the following considerations..." they then list four considerations: 

  1. Avoid extremely small clusters
  2. Try to maximize heterogeneity between clusters
  3. "All clusters should be significantly different across the set of clustering variables"
  4. Clusters should be theoretically valid and useful

If you follow the approaches in the videos, I think #3 here is our best argument. We're using the ANOVA with Bonferroni post-hoc pairwise comparisons to assess whether all clusters are significantly different across the set of clustering variables.  Here is the citation for Hair:

  • Hair, J. F., Jr., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th ed.). Upper Saddle River, NJ: Prentice Hall.