Home > Transforms Nodes > Filter Columns > Transform > Edit Transform Node > Add Transform > Outlier
An outlier is a data value that does not come from the typical population of data; in other words, an extreme value. In a normal distribution, outliers are typically at least 3 standard deviations from the mean. Outliers are usually replaced with some value that is not extreme or they are replaced with NULL.
You can define outlier treatments for numerical columns only.
To define an outlier transform, select Outlier for Transform Type. Next select Outlier Type to specify how outliers are determined. The choices are:
Standard Deviation, the default. This selection allows you to specify the number of standard deviations that define an outlier. The default is to specify 3 Multiples of sigma, that is 3 standard deviations. So that an outlier is a value less than mean - 3 * standard deviation or greater than mean + 3* standard deviation.
Percent allows you to specify that outliers are values in a bottom and a top percent. The default is to specify that outliers are in the bottom 5% or in the top 5%. You can change these values.
Value allows you to specify a Lower Value and an Upper Value so that outliers are those values less than the Lower Value or greater than the Upper Value. The defaults for these values are -3*standard deviation for the lower value and #* the upper value, if statistics are available. If statistics are no available, the default for the lower value is 0 and for the upper value is 1. You can change these values, but the Upper Value must be bigger than the Lower Value.
For each Outlier Type, you must specify how to replace outliers. There are two possibilities:
Nulls, the default
Edge Value
For example, suppose that 10 is the mean of an column's distribution and 5 is the standard deviation. Suppose that outliers are values that are less than -5 (the mean minus 3 times the standard deviation) or greater than 25 (the mean plus 3 times the standard deviation). -10 is an outlier; you can either replace -10 with NULL
or with -5, the edge value.