Outlier

An outlier is a data value that does not come from the typical population of data; in other words, an extreme value. In a normal distribution, outliers are typically at least 3 standard deviations from the mean. Outliers are usually replaced with some value that is not extreme or they are replaced with NULL.

You can define outlier treatments for numerical columns only.

To define an outlier transform, select Outlier for Transform Type. Next select Outlier Type to specify how outliers are determined. The choices are:

For each Outlier Type, you must specify how to replace outliers. There are two possibilities:

For example, suppose that 10 is the mean of an column's distribution and 5 is the standard deviation. Suppose that outliers are values that are less than -5 (the mean minus 3 times the standard deviation) or greater than 25 (the mean plus 3 times the standard deviation). -10 is an outlier; you can either replace -10 with NULL or with -5, the edge value.