Home > Data Mining Algorithms > Anomaly Detection > Anomaly Detection Viewers a... > Association > AR Model Viewers and Algori... > Decision Tree > Expectation Maximization > EM Model Viewer and Algorit... > Generalized Linear Models > GLM Model Viewers and Algor... > k-Means > k-Means Algorithm
Oracle Data Mining implements an enhanced version of the k-means algorithm with the following features:
The algorithm builds models in a hierarchical manner. The algorithm builds a model top down using binary splits and refinement of all nodes at the end. In this sense, the algorithm is similar to the bisecting k-means algorithm. The centroid of the inner nodes in the hierarchy are updated to reflect changes as the tree evolves. The whole tree is returned.
The algorithm grows the tree one node at a time (unbalanced approach). Based on a user setting, the node with the largest variance is split to increase the size of the tree until the desired number of clusters is reached. The maximum number of clusters is specified as a build setting.
The algorithm provides probabilistic scoring and assignment of data to clusters.
The algorithm returns, for each cluster, a centroid (cluster prototype), histograms (one for each attribute), and a rule describing the hyperbox that encloses the majority of the data assigned to the cluster. The centroid reports the mode for categorical attributes or the mean and variance for numerical attributes.
The clusters discovered by enhanced k-Means are used to generate a Bayesian probability model that is then used during scoring (model apply) for assigning data points to clusters. The k-means algorithm can be interpreted as a mixture model where the mixture components are spherical multivariate normal distributions with the same variance for all components.