- Full-Text Searching
- Creating an Index
- Indexing an Object
- Full-Text Searching
- Summary and References
Creating an Index
The first step in implementing full-text searching with Lucene is to build an index. This is easy—you just specify a directory and an analyzer class. The analyzer breaks text fields into indexable tokens; this is a core part of Lucene.
Several types of analyzers are provided out of the box. Table 1 shows some of the more interesting ones.
Table 1 Lucene analyzers.
Analyzer |
Description |
StandardAnalyzer |
A sophisticated general-purpose analyzer. |
WhitespaceAnalyzer |
A very simple analyzer that just separates tokens using white space. |
StopAnalyzer |
Removes common English words that are not usually useful for indexing. |
SnowballAnalyzer |
An interesting experimental analyzer that works on word roots (a search on rain should also return entries with raining, rained, and so on). |
There are even a number of language-specific analyzers, including analyzers for German, Russian, French, Dutch, and others.
It isn’t difficult to implement your own analyzer, though the standard ones often do the job well enough. For the sake of simplicity, we’ll use the StandardAnalyzer in this tutorial.
Next, we need to create an IndexWriter object. The IndexWriter object is used to create the index and to add new index entries to this index. You can create an IndexWriter with the StandardAnalyzer analyzer as follows:
IndexWriter indexWriter = new IndexWriter("index", new StandardAnalyzer(), true);