Outlier

An outlier is a data value that is not in the typical population of data; in other words, extreme values. In a normal distribution, outliers are typically at least 3 standard deviations from the mean.

You specify a treatment by defining what constitutes an outlier (for example, all values in the top and bottom 5% of values) and how to replace outliers. You usually replace outliers with NULL or edge values. For example, suppose that 10 is the mean of an attribute's distribution and 5 is the standard deviation. Suppose that outliers are values that are less than -5 (the mean minus 3 times the standard deviation) or greater than 25 (the mean plus three times the standard deviation). In this case, you can either replace the outlier -10 with NULL or with -5.