BINARY SEARCH
The sample Java program that we started with, shown on the facing page, is based on the famous, effective, and widely used binary search algorithm. This example is a prototype of the way in which we will examine new algorithms throughout the book. As with all of the programs we consider, it is both a precise definition of the method and a complete Java implementation that you can download from the booksite.
Binary search.
We will study the binary search algorithm in detail in Section 3.2, but a brief description is appropriate here. The algorithm is implemented in the static method rank(), which takes an integer key and a sorted array of int values as arguments and returns the index of the key if it is present in the array, -1 otherwise. It accomplishes this task by maintaining variables lo and hi such that the key is in a[lo..hi] if it is in the array, then entering into a loop that tests the middle entry in the interval (at index mid). If the key is equal to a[mid], the return value is mid; otherwise the method cuts the interval size about in half, looking at the left half if the key is less than a[mid] and at the right half if the key is greater than a[mid]. The process terminates when the key is found or the interval is empty. Binary search is effective because it needs to examine just a few array entries (relative to the size of the array) to find the key (or determine that it is not there).
Figure 1.8 Binary search in an ordered array
Development client.
For every algorithm implementation, we include a development client main() that you can use with sample input files provided in the book and on the booksite to learn about the algorithm and to test its performance. In this example, the client reads integers from the file named on the command line, then prints any integers on standard input that do not appear in the file. We use small test files such as those shown at right to demonstrate this behavior, and as the basis for traces and examples such as those at left above. We use large test files to model real-world applications and to test performance (see page 48).
Figure 1.9 Small test files forBinarySearch test client
Whitelisting.
When possible, our development clients are intended to mirror practical situations and demonstrate the need for the algorithm at hand. In this case, the process is known as whitelisting. Specifically, imagine a credit card company that needs to check whether customer transactions are for a valid account. To do so, it can
- Keep customers account numbers in a file, which we refer to as a whitelist.
- Produce the account number associated with each transaction in the standard input stream.
- Use the test client to put onto standard output the numbers that are not associated with any customer. Presumably the company would refuse such transactions.
It would not be unusual for a big company with millions of customers to have to process millions of transactions or more. To model this situation, we provide on the booksite the files largeW.txt (1 million integers) and largeT.txt (10 million integers).
Performance.
A working program is often not sufficient. For example, a much simpler implementation of rank(), which does not even require the array to be sorted, is to check every entry, as follows:
public static int rank(int key, int[] a) { for (int i = 0; i < a.length; i++) if (a[i] == key) return i; return -1; }
Given this simple and easy-to-understand solution, why do we use mergesort and binary search? If you work Exercise 1.1.38, you will see that your computer is too slow to run this brute-force implementation of rank() for large numbers of inputs (say, 1 million whitelist entries and 10 million transactions). Solving the whitelist problem for a large number of inputs is not feasible without efficient algorithms such as binary search and mergesort. Good performance is often of critical importance, so we lay the groundwork for studying performance in Section 1.4 and analyze the performance characteristics of all of our algorithms (including binary search, in Section 3.1 and mergesort, in Section 2.2).
In the present context, our goal in thoroughly outlining our programming model is to ensure that you can run code like BinarySearch on your computer, use it on test data like ours, and modify it to adapt to various situations (such as those described in the exercises at the end of this section), in order to best understand its applicability. The programming model that we have sketched is designed to facilitate such activities, which are crucial to our approach to studying algorithms.
Figure 1.10 Large files for BinarySearch test client