Using Full-Text Search Catalogs
- Accessing the Catalog
- Noise Words
- Efficiency in Searching
- Summary
Accessing the Catalog
With your catalogs fully set up and scheduled populations occurring on a regular basis (see my previous article, "SQL Server Search Configuration"), it's time to access the power of the Microsoft Search service. New SQL functions have been created to give the user quite a bit of flexibility in searching character fields.
There are four additions to SQL Server's version of SQL that allow access to full-text catalogs:
CONTAINS is used in the WHERE clause of a query to find matches to exact words and phrases, with other options such as word proximity, weighted terms, and inflection of words.
CONTAINSTABLE has the same functionality as CONTAINS but is used in the FROM clause of a query and offers the added ability to incorporate relevancy to the results.
FREETEXT is used in the WHERE clause of a query to perform matches on the meaning of the words or phrase.
FREETEXTTABLE is similar to FREETEXT except that it's used in the FROM clause and can add relevancy to the results.
CONTAINS()
Let's look first at CONTAINS(). Two arguments are passed to this function in the WHERE clause:
The first argument indicates which column(s) in the catalogued table will be searched. This can refer to a specific column or to all the columns in a table that are indexed in a catalog.
The second argument is the search string. This argument can range from very simple to extremely complex, and can include words or phrases. These can be combined using typical Boolean arguments such as AND, OR, and AND NOT. Phrases are contained in double quotation marks (").
Following are some examples using CONTAINS().
Search a specific column for two words:
SELECT product_name, isbn FROM products WHERE CONTAINS (short_description, 'catalog AND index')
Search all columns for a word and phrase:
SELECT product_name, isbn FROM products WHERE CONTAINS (*, 'catalog OR "full-text index" ')
Boolean operations on words and phrases are likely to constitute the bulk of your full-text catalog use, but other options are available. For example, the asterisk (*) wildcard character can be used to find word fragments:
SELECT product_name, isbn FROM products WHERE CONTAINS (product_name, ' "catal*" AND "full-te*"')
Keep in mind that to use the asterisk (*) wildcard, the word fragment must be in double quotes ("").
Words can be designated to be NEAR each other. This is not a very precise match, since words can be very far apart and still be returned, but the relevance will be low. The tilde (~) character can also be used as NEAR, as in the following example:
SELECT product_name, isbn FROM products WHERE CONTAINS (product_name, ' full-text NEAR catalog ~ search')
To search for forms of words, use the INFLECTIONAL option. Verb tenses, singular and plural forms of words, and so on will be returned as result sets.
CONTAINSTABLE()
To return relevance within a search, use the CONTAINSTABLE() function within the FROM clause of a SQL query. There are four arguments for this function:
The first argument specifies the table to be searched.
As with CONTAINS(), the second parameter refers to columns. You can specify a column or use the asterisk (*) wildcard to designate all columns.
The third argument is the search string and works as in the CONTAINS() function.
The fourth argument allows the option of the top_n_by_rank number of results to come back. This argument is not required, but can help limit large result sets.
Following are some examples using CONTAINSTABLE().
Search a specific column for two words, returning the top 25 results:
SELECT product_name, isbn FROM CONTAINSTABLE (products, short_description, 'catalog AND index', 25) as a, products b WHERE a.[KEY] = b.product_id AND stock_status = 'INSTOCK'
Search all columns for a word and phrase, returning the title and search rank:
SELECT product_name, isbn. a.RANK FROM CONTAINSTABLE (products, *, 'catalog AND "full-text index"') as a, products b WHERE a.[KEY] = b.product_id AND stock_status = 'INSTOCK' ORDER BY a.RANK
There are two things to note in the above examples. The CONTAINSTABLE() function acts as a table, but needs to be joined to the table it represents. Assign an alias, as in the above example, and then join on the primary key of the table. The primary key is referenced as [KEY] and the rank column is labeled [RANK]. I've joined tables using an implicit join, but this also works with the explicit join of INNER JOIN.
FREETEXT() and FREETEXTTABLE()
The FREETEXT() and FREETEXTTABLE() functions work like CONTAINS() and CONTAINSTABLE() in the ways that you call and use them, but differ in that the search is a "fuzzy" search, as opposed to the exact matching of CONTAINS(). The search string argument is also not as flexible as with CONTAINS() and CONTAINSTABLE(). Wildcards, Boolean, NEAR, and so on are not used with the FREETEXT() and FREETEXTTABLE() functions, since they would defeat the purpose of a fuzzy or inexact search based on meaning.