Tolerant Search Algorithms with PHP and PostgreSQL
Using soundex
One way to implement tolerant search algorithms is to use the soundex algorithm for PostgreSQL, which can be found in the contributed directory of the PostgreSQL source tree. soundex is a method that matches similar sounding words. The algorithms substitute every word with a sequence of characters. This way of finding similar names was first used by the United States Census in 1880, 1900, and 1910, and has turned out to be an efficient way to perform error-tolerant searching.
In this section, we will present a simple example written in PHP, but before we get to PHP, we'll create a database called data:
[hs@athlon hs]$ createdb data CREATE DATABASE
In the next step, we go to the contrib/soundex directory in our PostgreSQL source tree, and compile the soundex module using make and make install. The binaries are now installed in the right directories, and we can insert the soundex module into the database we have just created. Therefore, we can use an SQL script in the soundex directory:
[hs@athlon soundex]$ psql data < soundex.sql CREATE CREATE
If no error has occurred, two functions have been inserted into the database.
After that, we write a small SQL script containing the data we want to insert into the database:
CREATE TABLE persons(id int4, name varchar(30)); COPY persons FROM stdin; 1 Paul 2 Alex 3 Epi 4 Eppi 5 Ebi 6 Everlast \.
To insert the data, we use the following command:
[hs@athlon html]$ psql data < data.sql CREATE
Now, we can start writing a simple PHP script, which displays a small form including a text field and a submit button:
<? echo '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> <html> <HEAD> <TITLE>Searching ...</TITLE> </HEAD> <body> '; echo '<form name="searching" action="search.php"> <INPUT NAME="searchstring" TYPE="text" SIZE="30"><br> <INPUT TYPE="submit" VALUE="Send request to the server"> </body></html>'; ?>
We choose the name searchstring for our text field. When clicking on the button, the user will be redirected to search.php, which can be seen in the following listing:
<? # displaying header echo '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> <html> <HEAD> <TITLE>Displaying result</TITLE> </HEAD> <body> '; # connecting to the database and performing query $dbh=pg_connect("dbname=data user=hs"); $sql="SELECT * FROM persons WHERE soundex(name)=soundex('$searchstring')"; $result=pg_exec($dbh, $sql); if(!$result) { # do something bad ... } else { # retrieving result $rows = pg_numrows($result); echo '<b>Result:</b><br>'; for ($j=0; $j < $rows; $j++) { $id=pg_result($result, $j, "id"); $name=pg_result($result, $j, "name"); echo "id: $id, name: $name<br>\n"; } } echo '</body></html>'; ?>
search.php displays a header and connects to the database called data that we created before. In the next step, we define a string containing the SQL code, and we need to select the data from the database. Let's have a closer look at that SQL command.
We select all columns in the persons table, in which the soundex code of the value submitted to search.php is equal to the soundex code of the records stored in the table. The query will return all records that sound similar to the value passed to the script.
In the next step, we retrieve the result of the query, and display it on the screen.
Let's try the script by looking for Epi:
Result: id: 3, name: Epi id: 4, name: Eppi id: 5, name: Ebi
The script has returned three records sounding like Epi. If we query for Alice, the result will look like this:
Result: id: 2, name: Alex
Alex sounds similar to Alice, so the record has been retrieved from the query.