Home > Data Mining Algorithms > Anomaly Detection > Anomaly Detection Viewers a... > Association > AR Model Viewers and Algori... > Decision Tree > Expectation Maximization > EM Model Viewer and Algorit... > EM Algorithm Settings
EM supports these settings:
Number of Clusters is the maximum number of leaf clusters generated by the algorithm. EM may return fewer clusters than the number specified, depending on the data. The number of clusters returned by EM cannot be greater than the number of components, which is governed by algorithm-specific settings. Depending on these settings, there may be fewer clusters than components. If component clustering is disabled, the number of clusters equals the number of components.
The default is System Determined. To specify a specific number of clusters, click User specified and type in an integer value.
Component Clustering is selected by default.
Component Cluster Threshold specifies a dissimilarity threshold value that controls the clustering of EM components. Smaller values may produce more clusters that are more compact while large values may produce fewer clusters that are more spread out. The default value is 2.
Linkage Function allows the specification of a linkage function for the agglomerative clustering step. The linkage functions are
Single uses the nearest distance within the branch. The clusters tend to be larger and have arbitrary shapes.
Single is the default.
Average uses the average distance within the branch. There is less chaining effect and the clusters are more compact.
Complete uses the maximum distance within the branch. The clusters are smaller and require strong component overlap.
Approximate Computation indicates whether the algorithm should use approximate computations to improve performance.
For EM, approximate computation is appropriate for large models with many components and for datasets with many columns. The approximate computation uses localized parameter optimization that restricts learning to parameters that are likely to have the most significant impact on the model.
Values for approximate Computation are
System Determined, the default
Enable
Disable
Number of Components specifies the maximum number of components in the model. The algorithm automatically determines the number of components, based on improvements in the likelihood function or based on regularization, up to the specified maximum.
The number of components must be greater than or equal to the number of clusters.
The default number of components is 20.
Max Number of Iterations specifies the maximum number of iterations in the EM core algorithm. Applies to the input table/view as a whole and does not allow per attribute specification.
The default is 100.
Log Likelihood Improvement specifies the percentage improvement in the value of the log likelihood function required to add a new component to the model.
The default value is 0.001
Convergence Criterion specifies the convergence criterion for EM. The convergence criteria are
System Determined, the default
Bayesian Information Criterion
Held-aside dataset
Numerical Distribution specifies the distribution for modeling numeric attributes. The options are the following distributions:
Bernoulli
Gaussian
System Determined, the default
When the Bernoulli or Gaussian distribution is chosen, all numerical attributes are modeled using the same distribution. When the distribution is system-determined, individual attributes may use different distributions (either Bernoulli or Gaussian), depending on the data.
Gather Class Statistics enables or disables the gathering of descriptive statistics for clusters (centroids, histograms, and rules). Disabling the cluster statistics will result in smaller models and will reduce the model details calculated.
The default is to enable (select) the Gather Class Statistics.
If you disable Gather Class Statistics, you will not be able to view models.
If you enable Gather Class Statistics, you can specify Min Percent of Attribute Rule Support.
Min Percent of Attribute Rule Support specifies the percent of the data rows assigned to a cluster that must be present in an attribute to include that attribute in the cluster rule. The default value is 0.1.
Data Preparation and Analysis specifies settings for data preparation and analysis. To view or change the selections, click Settings. For information, see EM Data Preparation and Analysis Settings
When you are done making changes, click OK.