Text Processing in Oracle Data Mining 11g Release 2 (11.2) and Earlier

Before text can be mined, it must undergo a special preprocessing step known as term extraction or feature extraction. This process breaks the text down into units (terms) that can be mined. Text terms may be keywords or other document-derived features.

Text preparation in Oracle Data Miner uses a Build Text node to transform text columns. Build Text does not support HTML or XML documents; it also does not support any binary data types.

Oracle Data Miner uses the facilities of Oracle Text to preprocess text columns.

You must preprocess text using the Text nodes, Apply Text, Build Text, and Text Reference.