Home > Transforms Nodes > Filter Columns > Edit Filter Columns Node > Define Filter Columns Settings
This dialog allows you to create and edit filter columns settings.
There are three kinds of settings:
Data Quality (percent of NULL
s, percent of values that are unique, and percent of constants)
Attribute Importance (build Attribute Importance model to identify important attributes); off by default
Sampling (default size for random sample for calculating statistics)
You can specify the following data quality criteria:
% Nulls less than or equal indicates the largest acceptable percent of NULL
values in a column of the data source. The default value is 95%. You may want to ignore columns that have a larger percent NULL
values.
% Unique less than or equal indicates the largest acceptable percent of values that are unique a column of the data source. The default value is 95%. If a column contains many unique values, it may not contain useful information for model building.
% Constant less than or equal indicates the largest acceptable percent of constant values in a column of the data source. If almost all the values in a column are the same, the column may not be useful for model building.
The filter columns by default uses a sample to determine data quality and attribute importance. The default is to use a Sample Size of 2,000 records. You can turn off sampling, that is use all of the data, or increase the sample size.
The default values for Data Quality and Sampling are specified in preferences; see Filter Columns for details. You can change the default.
By default, Filter Columns does not calculate Attribute Importance. See Attribute Importance for directions on how to find important attributes.