- Full-Text Searching
- Creating an Index
- Indexing an Object
- Full-Text Searching
- Summary and References
Full-Text Searching
Now that we’ve indexed our database, we can do some searching. Full-text searching is done using the IndexSearcher and QueryParser classes. You provide an analyzer object to the QueryParser; note that this must be the same one used during the indexing. You also specify the field that you want to search, and the (user-provided) full-text query. Here’s the class that handles the search function:
public class SearchEngine { /** Creates a new instance of SearchEngine */ public SearchEngine() { } public Hits performSearch(String queryString) throws IOException, ParseException { Analyzer analyzer = new StandardAnalyzer(); IndexSearcher is = new IndexSearcher("index"); QueryParser parser = new QueryParser("content", analyzer); Query query = parser.parse(queryString); Hits hits = is.search(query); return hits; } }
The search() function returns a Lucene Hits object. This object contains a list of Lucene Hit objects, in order of relevance. The resulting Document objects can be obtained directly, as shown here:
Hits hits = instance.performSearch("Notre Dame"); for(int i = 0; i < hits.length(); i++) { Document doc = hits.doc(i); String hotelName = doc.get("name"); ... }
As in this example, once you obtain the Document object, you can use the get() method to fetch field values that have been stored during indexing.
Another possible approach is to use an Iterator, as in the following example:
public void testPerformSearch() throws Exception { System.out.println("performSearch"); SearchEngine instance = new SearchEngine(); Hits hits = instance.performSearch("Notre Dame museum"); System.out.println("Results found: " + hits.length()); Iterator<Hit> iter = hits.iterator(); while(iter.hasNext()){ Hit hit = iter.next(); Document doc = hit.getDocument(); System.out.println(doc.get("name") + " " + doc.get("city") + " (" + hit.getScore() + ")"); } System.out.println("performSearch done"); }
In this example, you can see how the Hit object can be used not only to fetch the corresponding document, but also to fetch the relative "score"—getScore()—obtained by this document in the search. The score gives an idea of the relative pertinence of each document in the result set. For example, the unit test above produces the following output:
performSearch Results found: 9 Hôtel Notre Dame Paris (0.5789772) Hôtel Odeon Paris (0.40939873) Hôtel Tonic Paris (0.34116563) Hôtel Bellevue Paris (0.34116563) Hôtel Marais Paris (0.34116563) Hôtel Edouard VII Paris (0.16353565) Hôtel Rivoli Paris (0.11563717) Hôtel Trinité Paris (0.11563717) Clarion Cloitre Saint Louis Hotel Avignon (0.11563717) performSearch done