Home > Data Mining Algorithms > Anomaly Detection > Anomaly Detection Viewers a... > Association > AR Model Viewers and Algori... > Decision Tree > Expectation Maximization > EM Model Viewer and Algorit... > Generalized Linear Models > GLM Model Viewers and Algor... > k-Means > KM Model Viewer and Algorit... > KM Algorithm Settings
The k-Means (KM) algorithm supports these settings:
Number of Clusters is the maximum number of leaf clusters generated by the algorithm. The default is 10. k-Means usually produces the exact number of clusters specified, unless there are fewer distinct data points.
Growth Factor is a number greater than 1 and less than or equal to 5. This value specifies the growth factor for memory allocated to hold cluster data; the default is 2.
Convergence Tolerence must be between 0.001 (slow build) and 0.1 (fast build); the default is 0.01. The tolerance controls the convergence of the algorithm. The smaller the value, the closest to the optimal solution at the cost of longer run times. This parameter interacts with the number of iterations parameter.
Distance Function specifies how the algorithm calculates distance. The default distance function is Euclidean; other distance functions are Cosine and Fast Cosine.
Number of Iterations must be between 2 and 20; the default is 3. This value is the maximum number of iterations for the k-Means algorithm. In general, more iterations result in a slower build. However, the algorithm may reach the maximum, or it may converge early. The convergence is determined by whether the Convergence Tolerence setting is satisfied.
Min Percent Attribute Support is a number greater than or equal to 0. 0 and less than or equal to 1.0. This value is used to filter out rule predicates that do not meet the support threshold; setting this value too high can result in very short or even empty rules.
The default value is 0.1. The default value allows you to highlight the more important predicates instead producing a long list of predicates that have very low support.
In extreme cases, for very sparse data, all attribute predicates may be filtered out so that no rule is produced. If no rule is produced, you can lower the support threshold and rebuild the model to make the algorithm produce rules even if the predicate support is very low.
Number of Histogram Bins is a positive integer; the default value is 10. This value specifies the number of bins in the attribute histogram produced by k-Means. The bin boundaries for each attribute are computed globally on the entire training data set. The binning method is equi-width. All attributes have the same number of bins with the exception of attributes with a single value that have only one bin.
Split Criterion is either Variance or Size. The default is Variance. The split criterion is related to the initialization of the k-Means clusters. The algorithm builds a binary tree and adds one new cluster at a time. Size results in placing the new cluster in the area where the largest current cluster is located. Variance places the new cluster in the area of the most spread out cluster.