Text Processing in Oracle Data Mining 12c Release 1 (12.1)

Oracle Data Mining includes significant enhancements in text processing that greatly simplify the data mining process (model build, deployment and scoring) when unstructured text data is present in the input.

Oracle Data Mining interprets CLOB columns and long VARCHAR2 columns automatically as unstructured text. Additionally, you can specify columns of short VARCHAR2, CHAR, BLOB, and BFILE as unstructured text. Unstructured text includes data items such as web pages, document libraries, Power Point presentations, product specifications, e-mail messages, comment fields in reports, and call center notes.

Oracle Data Mining uses Oracle Text utilities and term weighting strategies to transform unstructured text for mining. In text transformation, text terms are extracted and given numeric values in a text index. The text transformation process is configurable for the model and for individual attributes. Once transformed, the text can by mined with a data mining algorithm.

You can specify data preparation for text nodes when you define a model node, as described in Specify Text Characteristics.

If you connect to Oracle 12c Release 1 or later, it is not always necessary to use the Text nodes, Apply Text, Build Text, and Text Reference.