Expectation Maximization


Note:

Expectation Maximization requires Oracle Database 12c.

Expectation Maximization (EM) is a density estimation technique. Oracle Data Mining implements EM as a distribution-based clustering algorithm that uses probability density estimation.

In density estimation, the goal is to construct a density function that captures how a given population is distributed. The density estimate is based on observed data that represents a sample of the population.

Dense areas are interpreted as components or clusters. Density-based clustering is conceptually different from distance-based clustering (such as k-Means), where emphasis is placed on minimizing inter-cluster and maximizing the intra-cluster distances.

The shape of the probability density function used in EM effectively predetermines the shape of the identified clusters. For example, Gaussian density functions can identify single peak symmetric clusters. These clusters are modeled by single components. Clusters of more complex shape need to be modeled by multiple components. The EM algorithm assigns model components to high-level clusters by default.