Home > Text Nodes > Oracle Text Concepts
A theme is a topic associated with a given document. A document may have many themes. A theme does not have to appear in a document; for example, a document containing the words San Francisco
may have California
as one of its themes.
A stopword is a word that is not indexed during text transformations. A stopword is usually a low information word; in English a, the, this, or with are usually stopwords. A stoplist is a list of stopwords. Oracle Text supplies a stoplist for every language. By default during indexing, the system uses the Oracle Text default stoplist for your language. You can edit the default stoplist or create a new one.
Note: In Oracle Data Miner, stoplists are shared across all transformations and are not owned by a specific transformation. |
A stoptheme is a theme to be skipped over during indexing. Stopthemes are specified by adding them to stoplists.
Oracle text uses stopwords and stopthemes to indicate text that can be safely ignored during text mining.
The Oracle Text lexer breaks source text into tokens or themes—usually words—in accordance with a specified language. To extract tokens, the lexer uses parameters as defined by a lexer preference. These parameters include the definitions for the characters that separate tokens, such as whitespace, and whether to convert text to all uppercase or not. When theme indexing is enabled, the lexer analyses text to create theme tokens.
For detailed information about how Oracle Text functions, see the Oracle Text Reference.