- Full-Text Searching
- Creating an Index
- Indexing an Object
- Full-Text Searching
- Summary and References
Indexing an Object
Now you need to index your business objects. To index an object, you use the Lucene Document class, to which you add the fields that you want indexed. A Lucene Document is basically a container for a set of indexed fields. This is best illustrated by an example:
Document doc = new Document(); doc.add(new Field("description", hotel.getDescription(), Field.Store.YES, Field.Index.TOKENIZED));
To add a field to a document, you create a new instance of the Field class. A field is made up of a name and a value (the first two parameters in the class constructor). The value may take the form of a String, or a Reader if the object to be indexed is a file.
The two other parameters are used to determine how the field will be stored and indexed in the Lucene index:
- Storing the value. Does the value need to be stored in the index, or just indexed and discarded? Storing the value is useful if the value should be displayed in the search result list, for example. If the value must be stored, use Field.Store.YES. You can also use Field.Store.COMPRESS for large documents or binary value fields. If you don’t need to store the value, use Field.Store.NO.
- Indexing the value. Does the value need to be indexed? A database identified, for example, may just be stored and used later for object retrieval, but not indexed. In this case, you use Field.Index.NO. In most other cases, you’ll index the value using the token analyzer associated with the index writer. To do this, you use Field.Index.TOKENIZED. The value Field.Index.UN_TOKENIZED can be used if you need to index a value without parsing it with the analyzer; in this case, the value will be used "as is."
For our example, we just want some fairly simple full-text searching. So we add the following fields:
- The hotel identifier, so we can retrieve the object later on from the query result list.
- The hotel name, which we need to display in the query result lists.
- The hotel description, if we need to display this information in the query result lists.
- Composite text containing key fields of the Hotel object:
- Hotel name
- Hotel city
- Hotel description
We want full-text indexing on this field. We don’t need to display the indexed text in the query results, so we use Field.Store.NO to save index space.
Here’s the method that indexes a given hotel:
public static void indexHotel(Hotel hotel) throws IOException { IndexWriter writer = (IndexWriter) getIndexWriter(false); Document doc = new Document(); doc.add(new Field("id", hotel.getId(), Field.Store.YES, Field.Index.NO)); doc.add(new Field("name", hotel.getName(), Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("city", hotel.getCity(), Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.add(new Field("description", hotel.getDescription(), Field.Store.YES, Field.Index.TOKENIZED)); String fullSearchableText = hotel.getName() + " " + hotel.getCity() + " " + hotel.getDescription(); doc.add(new Field("content", fullSearchableText, Field.Store.NO, Field.Index.TOKENIZED)); writer.addDocument(doc); }
Once the indexing is finished, you have to close the index writer, which updates and closes the associated files on the disk. Opening and closing the index writer is time-consuming, so it’s not a good idea to do it systematically for each operation in the case of batch updates. For example, here’s a function that rebuilds the whole index:
public void rebuildIndexes() throws IOException { // // Erase existing index // getIndexWriter(true); // // Index all hotel entries // Hotel[] hotels = HotelDatabase.getHotels(); for(Hotel hotel: hotels) { indexHotel(hotel); } // // Don’t forget to close the index writer when done // closeIndexWriter(); }