Home > Data Mining Algorithms > Anomaly Detection > Anomaly Detection Viewers a... > Association > AR Model Viewers and Algori... > Decision Tree > Expectation Maximization > EM Model Viewer and Algorit... > Generalized Linear Models > GLM Model Viewers and Algor... > GLM Classification Algorith...
GLM supports these settings for classification:
Generate Row Diagnostics is not selected by default. To generate row diagnostics, you must select this option and also specify a Case ID.
If you do not specify a Case ID, this setting is not available.
You can view Row Diagnostics on the Diagnostics tab in the model viewer. To further analyze row diagnostics, use a Model Details node to extract the row diagnostics table.
Confidence Level: A positive number that is less than 1.0. Indicates the degree of certainty that the true probability lies within the confidence bounds computed by the model. The default confidence is 0.95.
Reference Class name: The Reference Target Class is the target value used as a reference in a binary logistic regression model. Probabilities are produced for the other (non-reference) class. By default, the algorithm chooses the value with the highest prevalence (the most cases). If there are ties, the attributes are sorted alpha-numerically in ascending order. The default for Reference Class name is System Determined, that is, the algorithm determines the value.
To select a specific value, see Choose Reference Value (GLMC).
Missing Values Treatment: The default is Mean Mode, that is, use mean for numeric values and mode for categorical values. You can also select Delete Row to delete any row that contains missing values. If you delete rows with missing values, the same missing values treatment (delete rows) must be applied to any data that the model is applied to.
Specify Row Weights Column: The default is to not specify a row weights column. The Row Weights Column is a column in the training data that contains a weighting factor for the rows.
Row weights can be used as a compact representation of repeated rows, as in the design of experiments where a specific configuration is repeated several times.
Row weights can also be used to emphasize certain rows during model construction. For example, to bias the model towards rows that are more recent and away from potentially obsolete data.
To specify a Row Weights column, click the check box and select the column form the list.
Ridge Regression: The default is to select ridge regression.
If you select Ridge Regression, Feature Selection is automatically de-selected.
Ridge regression is a technique that compensates for multicollinearity (multivariate regression with correlated predictors). Oracle Data Mining supports ridge regression for both regression and classification mining functions.
To specify options for ridge regression, click Option to open the Ridge Regression Option Dialog (GLMC).
When ridge regression is enabled, fewer global details are returned. For example, when ridge regression is enabled, no prediction bounds are produced.
Note: If you are connected to Oracle Database 11g Release 2 (11.2) and you get the errorORA-40024 when you build a GLM model, enable Ridge Regression and rebuild the model. |
Feature Selection/Generation: Requires connection to Oracle Database 12c. By default, Feature Selection/Generation is not selected. To specify Feature Selection or view or specify Feature Selection settings, click Option to launch the Feature Selection Option Dialog.
If you select Feature Selection, Ridge Regression is automatically de-selected.
Approximate Computation: Specifies whether the algorithm should use approximate computations to improve performance. For GLM, approximation is appropriate for data sets that have many rows and are densely populated (not sparse).
Values for Approximate Computation are
System Determined, the default
Enable
Disable