Home > Model Nodes > Data Used for Model Build > Specify Text Characteristics
If you are connected to Oracle Database 12c, the Text tab of the Edit Model Build dialog allows you to specify text characteristics.
If you specify text characteristics on the Text tab, you are not required to use the Text modes.
Note: If you are connected to Oracle Database 11g Release 2 (11.2) or earlier, the Text tab is not available. Use Text Nodes. |
For information about text processing, including definitions of terms, see Oracle Text Concepts.
Text is available for any of the following data types: CHAR, VARCHAR2, BLOB, CLOB, NCHAR, or NVARCHAR2.
To examine or specify text characteristics for data mining, either double-click the build node or right-click the node and select Edit from the context menu. Click the Text tab.
The Text tab allows you to modify the following:
Categorical cutoff value allows you to control the cutoff used to determine whether a column should be considered a Text or Categorical mining type. The cutoff value is an integer; it must be 10 or greater and less than or equal to 4000. The default value is 200.
Default Transform Type specifies the default transform type for column-level text settings. The default value is Token; you can also specify Theme.
If Default Transform Type is Token, the Default Settings are as follows:
Languages specifies the languages used in the documents. The default is one language English. To change this value, select from the drop down list. You can select more than one language.
Stemming is not selected. To select Stemming, click the box.
Not all languages support Stemming. If the language is one or more of English, Dutch, French, German, Italian, or Spanish, stemming is automatically enabled.
If Stemming is enabled, stemmed words are returned for supported language(s); otherwise original words returned.
Stoplist specifies the stoplist to use. The default is to use the Default stoplist. You can add stoplists or edit stoplists. See Stoplist Editor for more information.
If you select more than one language and the selected stoplist is Default, default stop words for language(s) added to the Default stoplist (from the repository). No duplicate stop words are added.
Tokens specifies the maximum number of tokens across all documents. The default number is 3000.
If Default Transform Type is Theme, the Default Settings are as follows:
Languages specifies the languages used in the documents. The default is one language Arabic. To change this value, select from the drop down list. You can select more than one language.
Stoplist specifies the stoplist to use. The default is to use the Default stoplist. You can add stoplists or edit stoplists. See Stoplist Editor for more information.
If you select more than one language and the selected stoplist is Default, default stop words for language(s) added to the Default stoplist (from the repository). No duplicate stop words are added.
Themes specifies the maximum number of themes across all documents. The default number is 3000.
Click Stoplists to invoke the Stoplist Editor. You can view, edit, and create stoplists.
You can use the same stoplist for all text columns.