2.4 Case Study: Percolation
THE PROGRAMMING TOOLS THAT WE HAVE considered to this point allow us to attack all manner of important problems. We conclude our study of functions and modules by considering a case study of developing a program to solve an interesting scientific problem. Our purpose in doing so is to review the basic elements that we have covered, in the context of the various challenges that you might face in solving a specific problem, and to illustrate a programming style that you can apply broadly.
Our example applies a widely applicable computational technique known as Monte Carlo simulation to study a natural model known as percolation. The term “Monte Carlo simulation” is broadly used to encompass any computational technique that employs randomness to estimate an unknown quantity by performing multiple trials (known as simulations). We have used it in several other contexts already—for example, in the gambler’s ruin and coupon collector problems. Rather than develop a complete mathematical model or measure all possible outcomes of an experiment, we rely on the laws of probability.
In this case study we will learn quite a bit about percolation, a model which underlies many natural phenomena. Our focus, however, is on the process of developing modular programs to address computational tasks. We identify subtasks that can be independently addressed, striving to identify the key underlying abstractions and asking ourselves questions such as the following: Is there some specific subtask that would help solve this problem? What are the essential characteristics of this specific subtask? Might a solution that addresses these essential characteristics be useful in solving other problems? Asking such questions pays significant dividend, because they lead us to develop software that is easier to create, debug, and reuse, so that we can more quickly address the main problem of interest.
Percolation
It is not unusual for local interactions in a system to imply global properties. For example, an electrical engineer might be interested in composite systems consisting of randomly distributed insulating and metallic materials: which fraction of the materials need to be metallic so that the composite system is an electrical conductor? As another example, a geologist might be interested in a porous landscape with water on the surface (or oil below). Under which conditions will the water be able to drain through to the bottom (or the oil to gush through to the surface)? Scientists have defined an abstract process known as percolation to model such situations. It has been studied widely, and shown to be an accurate model in a dizzying variety of applications, beyond insulating materials and porous substances to the spread of forest fires and disease epidemics to evolution to the study of the Internet.
For simplicity, we begin by working in two dimensions and model the system as an n-by-n grid of sites. Each site is either blocked or open; open sites are initially empty. A full site is an open site that can be connected to an open site in the top row via a chain of neighboring (left, right, up, down) open sites. If there is a full site in the bottom row, then we say that the system percolates. In other words, a system percolates if we fill all open sites connected to the top row and that process fills some open site on the bottom row. For the insulating/metallic materials example, the open sites correspond to metallic materials, so that a system that percolates has a metallic path from top to bottom, with full sites conducting. For the porous substance example, the open sites correspond to empty space through which water might flow, so that a system that percolates lets water fill open sites, flowing from top to bottom.
In a famous scientific problem that has been heavily studied for decades, scientists are interested in the following question: if sites are independently set to be open with site vacancy probability p (and therefore blocked with probability 1–p), what is the probability that the system percolates? No mathematical solution to this problem has yet been derived. Our task is to write computer programs to help study the problem.
Basic scaffolding
To address percolation with a Java program, we face numerous decisions and challenges, and we certainly will end up with much more code than in the short programs that we have considered so far in this book. Our goal is to illustrate an incremental style of programming where we independently develop modules that address parts of the problem, building confidence with a small computational infrastructure of our own design and construction as we proceed.
The first step is to pick a representation of the data. This decision can have substantial impact on the kind of code that we write later, so it is not to be taken lightly. Indeed, it is often the case that we learn something while working with a chosen representation that causes us to scrap it and start all over using a new one.
For percolation, the path to an effective representation is clear: use an n-by-n array. Which type of data should we use for each element? One possibility is to use integers, with the convention that 0 indicates an empty site, 1 indicates a blocked site, and 2 indicates a full site. Alternatively, note that we typically describe sites in terms of questions: Is the site open or blocked? Is the site full or empty? This characteristic of the elements suggests that we might use n-by-n arrays in which element is either true or false. We refer to such two-dimensional arrays as boolean matrices. Using boolean matrices leads to code that is easier to understand than the alternative.
Boolean matrices are fundamental mathematical objects with many applications. Java does not provide direct support for operations on boolean matrices, but we can use the methods in StdArrayIO (see PROGRAM 2.2.2) to read and write them. This choice illustrates a basic principle that often comes up in programming: the effort required to build a more general tool usually pays dividends.
Eventually, we will want to work with random data, but we also want to be able to read and write to files because debugging programs with random inputs can be counterproductive. With random data, you get different input each time that you run the program; after fixing a bug, what you want to see is the same input that you just used, to check that the fix was effective. Accordingly, it is best to start with some specific cases that we understand, kept in files formatted compatible with StdArrayIO (dimensions followed by 0 and 1 values in row-major order).
When you start working on a new problem that involves several files, it is usually worthwhile to create a new folder (directory) to isolate those files from others that you may be working on. For example, we might create a folder named percolation to store all of the files for this case study. To get started, we can implement and debug the basic code for reading and writing percolation systems, create test files, check that the files are compatible with the code, and so forth, before worrying about percolation at all. This type of code, sometimes called scaffolding, is straightforward to implement, but making sure that it is solid at the outset will save us from distraction when approaching the main problem.
Now we can turn to the code for testing whether a boolean matrix represents a system that percolates. Referring to the helpful interpretation in which we can think of the task as simulating what would happen if the top were flooded with water (does it flow to the bottom or not?), our first design decision is that we will want to have a flow() method that takes as an argument a boolean matrix isOpen[][] that specifies which sites are open and returns another boolean matrix isFull[][] that specifies which sites are full. For the moment, we will not worry at all about how to implement this method; we are just deciding how to organize the computation. It is also clear that we will want client code to be able to use a percolates() method that checks whether the array returned by flow() has any full sites on the bottom.
Percolation (PROGRAM 2.4.1) summarizes these decisions. It does not perform any interesting computation, but after running and debugging this code we can start thinking about actually solving the problem. A method that performs no computation, such as flow(), is sometimes called a stub. Having this stub allows us to test and debug percolates() and main() in the context in which we will need them. We refer to code like PROGRAM 2.4.1 as scaffolding. As with scaffolding that construction workers use when erecting a building, this kind of code provides the support that we need to develop a program. By fully implementing and debugging this code (much, if not all, of which we need, anyway) at the outset, we provide a sound basis for building code to solve the problem at hand. Often, we carry the analogy one step further and remove the scaffolding (or replace it with something better) after the implementation is complete.
Program 2.4.1 Percolation scaffolding
public class Percolation
{
public static boolean[][] flow(boolean[][] isOpen)
{
int n = isOpen.length;
boolean[][] isFull = new boolean[n][n];
// The isFull[][] matrix computation goes here.
return isFull;
}
public static boolean percolates(boolean[][] isOpen)
{
boolean[][] isFull = flow(isOpen);
int n = isOpen.length;
for (int j = 0; j < n; j++)
if (isFull[n-1][j]) return true;
return false;
}
public static void main(String[] args)
{
boolean[][] isOpen = StdArrayIO.readBoolean2D();
StdArrayIO.print(flow(isOpen));
StdOut.println(percolates(isOpen));
}
}
n | system size (n-by-n)
isFull[][] | full sites
isOpen[][] | open sites
To get started with percolation, we implement and debug this code, which handles all the straightforward tasks surrounding the computation. The primary function flow() returns a boolean matrix giving the full sites (none, in the placeholder code here). The helper function percolates() checks the bottom row of the returned matrix to decide whether the system percolates. The test client main() reads a boolean matrix from standard input and prints the result of calling flow() and percolates() for that matrix.
% more testEZ.txt
5 5
0 1 1 0 1
0 0 1 1 1
1 1 0 1 1
1 0 0 0 1
0 1 1 1 1
% java Percolation < testEZ.txt
5 5
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
false
Vertical percolation
Given a boolean matrix that represents the open sites, how do we figure out whether it represents a system that percolates? As we will see later in this section, this computation turns out to be directly related to a fundamental question in computer science. For the moment, we will consider a much simpler version of the problem that we call vertical percolation.
The simplification is to restrict attention to vertical connection paths. If such a path connects top to bottom in a system, we say that the system vertically percolates along the path (and that the system itself vertically percolates). This restriction is perhaps intuitive if we are talking about sand traveling through cement, but not if we are talking about water traveling through cement or about electrical conductivity. Simple as it is, vertical percolation is a problem that is interesting in its own right because it suggests various mathematical questions. Does the restriction make a significant difference? How many vertical percolation paths do we expect?
Determining the sites that are filled by some path that is connected vertically to the top is a simple calculation. We initialize the top row of our result array from the top row of the percolation system, with full sites corresponding to open ones. Then, moving from top to bottom, we fill in each row of the array by checking the corresponding row of the percolation system. Proceeding from top to bottom, we fill in the rows of isFull[][] to mark as true all elements that correspond to sites in isOpen[][] that are vertically connected to a full site on the previous row. PROGRAM 2.4.2 is an implementation of flow() for Percolation that returns a boolean matrix of full sites (true if connected to the top via a vertical path, false otherwise).
Testing
After we become convinced that our code is behaving as planned, we want to run it on a broader variety of test cases and address some of our scientific questions. At this point, our initial scaffolding becomes less useful, as representing large boolean matrices with 0s and 1s on standard input and standard output and maintaining large numbers of test cases quickly becomes unwieldy. Instead, we want to automatically generate test cases and observe the operation of our code on them, to be sure that it is operating as we expect. Specifically, to gain confidence in our code and to develop a better understanding of percolation, our next goals are to:
Test our code for large random boolean matrices.
Estimate the probability that a system percolates for a given p.
Program 2.4.2 Vertical percolation detection
public static boolean[][] flow(boolean[][] isOpen)
{ // Compute full sites for vertical percolation.
int n = isOpen.length;
boolean[][] isFull = new boolean[n][n];
for (int j = 0; j < n; j++)
isFull[0][j] = isOpen[0][j];
for (int i = 1; i < n; i++)
for (int j = 0; j < n; j++)
isFull[i][j] = isOpen[i][j] && isFull[i-1][j];
return isFull;
}
n | system size (n-by-n)
isFull[][] | full sites
isOpen[][] | open sites
Substituting this method for the stub in PROGRAM 2.4.1 gives a solution to the vertical-only percolation problem that solves our test case as expected (see text).
% more test5.txt
5 5
0 1 1 0 1
0 0 1 1 1
1 1 0 1 1
1 0 0 0 1
0 1 1 1 1
% java Percolation < test5.txt
5 5
0 1 1 0 1
0 0 1 0 1
0 0 0 0 1
0 0 0 0 1
0 0 0 0 1
true
To accomplish these goals, we need new clients that are slightly more sophisticated than the scaffolding we used to get the program up and running. Our modular programming style is to develop such clients in independent classes without modifying our percolation code at all.
Data visualization
We can work with much bigger problem instances if we use StdDraw for output. The following static method for Percolation allows us to visualize the contents of boolean matrices as a subdivision of the StdDraw canvas into squares, one for each site:
public static void show(boolean[][] a, boolean which)
{
int n = a.length;
StdDraw.setXscale(-1, n);
StdDraw.setYscale(-1, n);
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
if (a[i][j] == which)
StdDraw.filledSquare(j, n-i-1, 0.5);
}
The second argument which specifies which squares we want to fill—those corresponding to true elements or those corresponding to false elements. This method is a bit of a diversion from the calculation, but pays dividends in its ability to help us visualize large problem instances. Using show() to draw our boolean matrices representing blocked and full sites in different colors gives a compelling visual representation of percolation.
Monte Carlo simulation
We want our code to work properly for any boolean matrix. Moreover, the scientific question of interest involves random boolean matrices. To this end, we add another static method to Percolation:
public static boolean[][] random(int n, double p)
{
boolean[][] a = new boolean[n][n];
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
a[i][j] = StdRandom.bernoulli(p);
return a;
}
This method generates a random n-by-n boolean matrix of any given size n, each element true with probability p.
Having debugged our code on a few specific test cases, we are ready to test it on random systems. It is possible that such cases may uncover a few more bugs, so some care is in order to check results. However, having debugged our code for a small system, we can proceed with some confidence. It is easier to focus on new bugs after eliminating the obvious bugs.
WITH THESE TOOLS, A CLIENT FOR testing our percolation code on a much larger set of trials is straightforward. PercolationVisualizer (PROGRAM 2.4.3) consists of just a main() method that takes n and p from the command line and displays the result of the percolation flow calculation.
This kind of client is typical. Our eventual goal is to compute an accurate estimate of percolation probabilities, perhaps by running a large number of trials, but this simple tool gives us the opportunity to gain more familiarity with the problem by studying some large cases (while at the same time gaining confidence that our code is working properly). Before reading further, you are encouraged to download and run this code from the booksite to study the percolation process. When you run PercolationVisualizer for moderate-size n (50 to 100, say) and various p, you will immediately be drawn into using this program to try to answer some questions about percolation. Clearly, the system never percolates when p is low and always percolates when p is very high. How does it behave for intermediate values of p? How does the behavior change as n increases?
Estimating probabilities
The next step in our program development process is to write code to estimate the probability that a random system (of size n with site vacancy probability p) percolates. We refer to this quantity as the percolation probability. To estimate its value, we simply run a number of trials. The situation is no different from our study of coin flipping (see PROGRAM 2.2.6), but instead of flipping a coin, we generate a random system and check whether it percolates.
PercolationProbability (PROGRAM 2.4.4) encapsulates this computation in a method estimate(), which takes three arguments n, p, and trials and returns an estimate of the probability that an n-by-n system with site vacancy probability p percolates, obtained by generating trials random systems and calculating the fraction of them that percolate.
How many trials do we need to obtain an accurate estimate? This question is addressed by basic methods in probability and statistics, which are beyond the scope of this book, but we can get a feeling for the problem with computational experience. With just a few runs of PercolationProbability, you can learn that if the site vacancy probability is close to either 0 or 1, then we do not need many trials, but that there are values for which we need as many as 10,000 trials to be able to estimate it within two decimal places. To study the situation in more detail, we might modify PercolationProbability to produce output like Bernoulli (PROGRAM 2.2.6), plotting a histogram of the data points so that we can see the distribution of values (see EXERCISE 2.4.9).
Program 2.4.3 Visualization client
public class PercolationVisualizer
{
public static void main(String[] args)
{
int n = Integer.parseInt(args[0]);
double p = Double.parseDouble(args[1]);
StdDraw.enableDoubleBuffering();
// Draw blocked sites in black.
boolean[][] isOpen = Percolation.random(n, p);
StdDraw.setPenColor(StdDraw.BLACK);
Percolation.show(isOpen, false);
// Draw full sites in blue.
StdDraw.setPenColor(StdDraw.BOOK_BLUE);
boolean[][] isFull = Percolation.flow(isOpen);
Percolation.show(isFull, true);
StdDraw.show();
}
}
n | system size (n-by-n)
p | site vacancy probability
isOpen[][] | open sites
isFull[][] | full sites
This client takes two command-line argument n and p, generates an n-by-n random system with site vacancy probability p, determines which sites are full, and draws the result on standard drawing. The diagrams below show the results for vertical percolation.
Using PercolationProbability.estimate() represents a giant leap in the amount of computation that we are doing. All of a sudden, it makes sense to run thousands of trials. It would be unwise to try to do so without first having thoroughly debugged our percolation methods. Also, we need to begin to take the time required to complete the computation into account. The basic methodology for doing so is the topic of SECTION 4.1, but the structure of these programs is sufficiently simple that we can do a quick calculation, which we can verify by running the program. If we perform T trials, each of which involves n2 sites, then the total running time of PercolationProbability.estimate() is proportional to n2T. If we increase T by a factor of 10 (to gain more precision), the running time increases by about a factor of 10. If we increase n by a factor of 10 (to study percolation for larger systems), the running time increases by about a factor of 100.
Can we run this program to determine percolation probabilities for a system with billions of sites with several digits of precision? No computer is fast enough to use PercolationProbability.estimate() for this purpose. Moreover, in a scientific experiment on percolation, the value of n is likely to be much higher. We can hope to formulate a hypothesis from our simulation that can be tested experimentally on a much larger system, but not to precisely simulate a system that corresponds atom-for-atom with the real world. Simplification of this sort is essential in science.
You are encouraged to download PercolationProbability from the booksite to get a feel for both the percolation probabilities and the amount of time required to compute them. When you do so, you are not just learning more about percolation, but are also testing the hypothesis that the models we have just described apply to the running times of our simulations of the percolation process.
What is the probability that a system with site vacancy probability p vertically percolates? Vertical percolation is sufficiently simple that elementary probabilistic models can yield an exact formula for this quantity, which we can validate experimentally with PercolationProbability. Since our only reason for studying vertical percolation was an easy starting point around which we could develop supporting software for studying percolation methods, we leave further study of vertical percolation for an exercise (see EXERCISE 2.4.11) and turn to the main problem.
Program 2.4.4 Percolation probability estimate
public class PercolationProbability
{
public static double estimate(int n, double p, int trials)
{ // Generate trials random n-by-n systems; return empirical
// percolation probability estimate.
int count = 0;
for (int t = 0; t < trials; t++)
{ // Generate one random n-by-n boolean matrix.
boolean[][] isOpen = Percolation.random(n, p);
if (Percolation.percolates(isOpen)) count++;
}
return (double) count / trials;
}
public static void main(String[] args)
{
int n = Integer.parseInt(args[0]);
double p = Double.parseDouble(args[1]);
int trials = Integer.parseInt(args[2]);
double q = estimate(n, p, trials);
StdOut.println(q);
}
}
n | system size (n-by-n)
p | site vacancy probability
trials | number of trials
isOpen[][] | open sites
q | percolation probability
The method estimate() generates trials random n-by-n systems with site vacancy probability p and computes the fraction of them that percolate. This is a Bernoulli process, like coin flipping (see PROGRAM 2.2.6). Increasing the number of trials increases the accuracy of the estimate. If p is close to 0 or to 1, not many trials are needed to achieve an accurate estimate. The results below are for vertical percolation.
% java PercolationProbability 20 0.05 10
0.0
% java PercolationProbability 20 0.95 10
1.0
% java PercolationProbability 20 0.85 10
0.7
% java PercolationProbability 20 0.85 1000
0.564
% java PercolationProbability 40 0.85 100
0.1
Recursive solution for percolation
How do we test whether a system percolates in the general case when any path starting at the top and ending at the bottom (not just a vertical one) will do the job?
Remarkably, we can solve this problem with a compact program, based on a classic recursive scheme known as depth-first search. PROGRAM 2.4.5 is an implementation of flow() that computes the matrix isFull[][], based on a recursive four-argument version of flow() that takes as arguments the site vacancy matrix isOpen[][], the current matrix isFull[][], and a site position specified by a row index i and a column index j. The base case is a recursive call that just returns (we refer to such a call as a null call), for one of the following reasons:
Either i or j is outside the array bounds.
The site is blocked (isOpen[i][j] is false).
We have already marked the site as full (isFull[i][j] is true).
The reduction step is to mark the site as filled and issue recursive calls for the site’s four neighbors: isOpen[i+1][j], isOpen[i][j+1], isOpen[i][j-1], and isOpen[i-1][j]. The one-argument flow() calls the recursive method for every site on the top row. The recursion always terminates because each recursive call either is null or marks a new site as full. We can show by an induction-based argument (as usual for recursive programs) that a site is marked as full if and only if it is connected to one of the sites on the top row.
Tracing the operation of flow() on a tiny test case is an instructive exercise. You will see that it calls flow() for every site that can be reached via a path of open sites from the top row. This example illustrates that simple recursive programs can mask computations that otherwise are quite sophisticated. This method is a special case of the depth-first search algorithm, which has many important applications.
Program 2.4.5 Percolation detection
public static boolean[][] flow(boolean[][] isOpen)
{ // Fill every site reachable from the top row.
int n = isOpen.length;
boolean[][] isFull = new boolean[n][n];
for (int j = 0; j < n; j++)
flow(isOpen, isFull, 0, j);
return isFull;
}
public static void flow(boolean[][] isOpen,
boolean[][] isFull, int i, int j)
{ // Fill every site reachable from (i, j).
int n = isFull.length;
if (i < 0 || i >= n) return;
if (j < 0 || j >= n) return;
if (!isOpen[i][j]) return;
if (isFull[i][j]) return;
isFull[i][j] = true;
flow(isOpen, isFull, i+1, j); // Down.
flow(isOpen, isFull, i, j+1); // Right.
flow(isOpen, isFull, i, j-1); // Left.
flow(isOpen, isFull, i-1, j); // Up.
}
n | system size (n-by-n)
isOpen[][] | open sites
isFull[][] | full sites
i, j | current site row, column
Substituting these methods for the stub in PROGRAM 2.4.1 gives a depth-first-search-based solution to the percolation problem. The recursive flow() sets to true the element in isFull[][] corresponding to any site that can be reached from isOpen[i][j] via a chain of neighboring open sites. The one-argument flow() calls the recursive method for every site on the top row.
% more test8.txt
8 8
0 0 1 1 1 0 0 0
1 0 0 1 1 1 1 1
1 1 1 0 0 1 1 0
0 0 1 1 0 1 1 1
0 1 1 1 0 1 1 0
0 1 0 0 0 0 1 1
1 0 1 0 1 1 1 1
1 1 1 1 0 1 0 0
% java Percolation < test8.txt
8 8
0 0 1 1 1 0 0 0
0 0 0 1 1 1 1 1
0 0 0 0 0 1 1 0
0 0 0 0 0 1 1 1
0 0 0 0 0 1 1 0
0 0 0 0 0 0 1 1
0 0 0 0 1 1 1 1
0 0 0 0 0 1 0 0
true
To avoid conflict with our solution for vertical percolation (PROGRAM 2.4.2), we might rename that class PercolationVertical, making another copy of Percolation (PROGRAM 2.4.1) and substituting the two flow() methods in PROGRAM 2.4.5 for the placeholder flow(). Then, we can visualize and perform experiments with this algorithm with the PercolationVisualizer and PercolationProbability tools that we have developed. If you do so, and try various values for n and p, you will quickly get a feeling for the situation: the systems always percolate when the site vacancy probability p is high and never percolate when p is low, and (particularly as n increases) there is a value of p above which the systems (almost) always percolate and below which they (almost) never percolate.
Having debugged PercolationVisualizer and PercolationProbability on the simple vertical percolation process, we can use them with more confidence to study percolation, and turn quickly to study the scientific problem of interest. Note that if we want to experiment with vertical percolation again, we would need to edit PercolationVisualizer and PercolationProbability to refer to PercolationVertical instead of Percolation, or write other clients of both PercolationVertical and Percolation that run methods in both classes to compare them.
Adaptive plot
To gain more insight into percolation, the next step in program development is to write a program that plots the percolation probability as a function of the site vacancy probability p for a given value of n. Perhaps the best way to produce such a plot is to first derive a mathematical equation for the function, and then use that equation to make the plot. For percolation, however, no one has been able to derive such an equation, so the next option is to use the Monte Carlo method: run simulations and plot the results.
Immediately, we are faced with numerous decisions. For how many values of p should we compute an estimate of the percolation probability? Which values of p should we choose? How much precision should we aim for in these calculations? These decisions constitute an experimental design problem. Much as we might like to instantly produce an accurate rendition of the curve for any given n, the computation cost can be prohibitive. For example, the first thing that comes to mind is to plot, say, 100 to 1,000 equally spaced points, using StdStats (PROGRAM 2.2.5). But, as you learned from using PercolationProbability, computing a sufficiently precise value of the percolation probability for each point might take several seconds or longer, so the whole plot might take minutes or hours or even longer. Moreover, it is clear that a lot of this computation time is completely wasted, because we know that values for small p are 0 and values for large p are 1. We might prefer to spend that time on more precise computations for intermediate p. How should we proceed?
PercolationPlot (PROGRAM 2.4.6) implements a recursive approach with the same structure as Brownian (PROGRAM 2.3.5) that is widely applicable to similar problems. The basic idea is simple: we choose the maximum distance that we wish to allow between values of the x-coordinate (which we refer to as the gap tolerance), the maximum known error that we wish to tolerate in the y-coordinate (which we refer to as the error tolerance), and the number of trials T per point that we wish to perform. The recursive method draws the plot within a given interval [x0, x1], from (x0, y0) to (x1, y1). For our problem, the plot is from (0, 0) to (1, 1). The base case (if the distance between x0 and x1 is less than the gap tolerance, or the distance between the line connecting the two endpoints and the value of the function at the midpoint is less than the error tolerance) is to simply draw a line from (x0, y0) to (x1, y1). The reduction step is to (recursively) plot the two halves of the curve, from (x0, y0) to (xm, f (xm)) and from (xm, f (xm)) to (x1, y1).
The code in PercolationPlot is relatively simple and produces a good-looking curve at relatively low cost. We can use it to study the shape of the curve for various values of n or choose smaller tolerances to be more confident that the curve is close to the actual values. Precise mathematical statements about quality of approximation can, in principle, be derived, but it is perhaps not appropriate to go into too much detail while exploring and experimenting, since our goal is simply to develop a hypothesis about percolation that can be tested by scientific experimentation.
Program 2.4.6 Adaptive plot client
public class PercolationPlot
{
public static void curve(int n,
double x0, double y0,
double x1, double y1)
{ // Perform experiments and plot results.
double gap = 0.01;
double err = 0.0025;
int trials = 10000;
double xm = (x0 + x1)/2;
double ym = (y0 + y1)/2;
double fxm = PercolationProbability.estimate(n, xm, trials);
if (x1 - x0 < gap || Math.abs(ym - fxm) < err)
{
StdDraw.line(x0, y0, x1, y1);
return;
}
curve(n, x0, y0, xm, fxm);
StdDraw.filledCircle(xm, fxm, 0.005);
curve(n, xm, fxm, x1, y1);
}
public static void main(String[] args)
{ // Plot experimental curve for n-by-n percolation system.
int n = Integer.parseInt(args[0]);
curve(n, 0.0, 0.0, 1.0, 1.0);
}
}
n | system size
x0, y0 | left endpoint
x1, y1 | right endpoint
xm, ym | midpoint
fxm | value at midpoint
gap | gap tolerance
err | error tolerance
trials | number of trials
This recursive program draws a plot of the percolation probability (experimental observations) against the site vacancy probability p (control variable) for random n-by-n systems.
Indeed, the curves produced by PercolationPlot immediately confirm the hypothesis that there is a threshold value (about 0.593): if p is greater than the threshold, then the system almost certainly percolates; if p is less than the threshold, then the system almost certainly does not percolate. As n increases, the curve approaches a step function that changes value from 0 to 1 at the threshold. This phenomenon, known as a phase transition, is found in many physical systems.
The simple form of the output of PROGRAM 2.4.6 masks the huge amount of computation behind it. For example, the curve drawn for n = 100 has 18 points, each the result of 10,000 trials, with each trial involving n2 sites. Generating and testing each site involves a few lines of code, so this plot comes at the cost of executing billions of statements. There are two lessons to be learned from this observation. First, we need to have confidence in any line of code that might be executed billions of times, so our care in developing and debugging code incrementally is justified. Second, although we might be interested in systems that are much larger, we need further study in computer science to be able to handle larger cases—that is, to develop faster algorithms and a framework for knowing their performance characteristics.
Function-call trace for PercolationPlot
PercolationPlot.curve()
PercolationProbability.estimate()
Percolation.random()
StdRandom.bernoulli()
.
. n2 times
.
StdRandom.bernoulli()
return
Percolation.percolates()
flow()
return
return
.
. T times
.
Percolation.random()
StdRandom.bernoulli()
.
. n2 times
.
StdRandom.bernoulli()
return
Percolation.percolates()
flow()
return
return
return
.
. once for each point
.
PercolationProbability.estimate()
Percolation.random()
StdRandom.bernoulli()
.
. n2 times
.
StdRandom.bernoulli()
return
Percolation.percolates()
flow()
return
return
.
.
. T times
Percolation.random()
StdRandom.bernoulli()
.
. n2 times
.
StdRandom.bernoulli()
return
Percolation.percolates()
flow()
return
return
return
return
With this reuse of all of our software, we can study all sorts of variants on the percolation problem, just by implementing different flow() methods. For example, if you leave out the last recursive call in the recursive flow() method in PROGRAM 2.4.5, it tests for a type of percolation known as directed percolation, where paths that go up are not considered. This model might be important for a situation like a liquid percolating through porous rock, where gravity might play a role, but not for a situation like electrical connectivity. If you run PercolationPlot for both methods, will you be able to discern the difference (see EXERCISE 2.4.10)?
To model physical situations such as water flowing through porous substances, we need to use three-dimensional arrays. Is there a similar threshold in the three-dimensional problem? If so, what is its value? Depth-first search is effective for studying this question, though the addition of another dimension requires that we pay even more attention to the computational cost of determining whether a system percolates (see EXERCISE 2.4.18). Scientists also study more complex lattice structures that are not well modeled by multidimensional arrays—we will see how to model such structures in SECTION 4.5.
Percolation is interesting to study via in silico experimentation because no one has been able to derive the threshold value mathematically for several natural models. The only way that scientists know the value is by using simulations like Percolation. A scientist needs to do experiments to see whether the percolation model reflects what is observed in nature, perhaps through refining the model (for example, using a different lattice structure). Percolation is an example of an increasing number of problems where computer science of the kind described here is an essential part of the scientific process.
Lessons
We might have approached the problem of studying percolation by sitting down to design and implement a single program, which probably would run to hundreds of lines, to produce the kind of plots that are drawn by PROGRAM 2.4.6. In the early days of computing, programmers had little choice but to work with such programs, and would spend enormous amounts of time isolating bugs and correcting design decisions. With modern programming tools like Java, we can do better, using the incremental modular style of programming presented in this chapter and keeping in mind some of the lessons that we have learned.
Expect bugs
Every interesting piece of code that you write is going to have at least one or two bugs, if not many more. By running small pieces of code on small test cases that you understand, you can more easily isolate any bugs and then more easily fix them when you find them. Once debugged, you can depend on using a library as a building block for any client.
Keep modules small
You can focus attention on at most a few dozen lines of code at a time, so you may as well break your code into small modules as you write it. Some classes that contain libraries of related methods may eventually grow to contain hundreds of lines of code; otherwise, we work with small files.
Limit interactions
In a well-designed modular program, most modules should depend on just a few others. In particular, a module that calls a large number of other modules needs to be divided into smaller pieces. Modules that are called by a large number of other modules (you should have only a few) need special attention, because if you do need to make changes in a module’s API, you have to reflect those changes in all its clients.
Develop code incrementally
You should run and debug each small module as you implement it. That way, you are never working with more than a few dozen lines of unreliable code at any given time. If you put all your code in one big module, it is difficult to be confident that any of it is free from bugs. Running code early also forces you to think sooner rather than later about I/O formats, the nature of problem instances, and other issues. Experience gained when thinking about such issues and debugging related code makes the code that you develop later in the process more effective.
Solve an easier problem
Some working solution is better than no solution, so it is typical to begin by putting together the simplest code that you can craft that solves a given problem, as we did with vertical percolation. This implementation is the first step in a process of continual refinements and improvements as we develop a more complete understanding of the problem by examining a broader variety of test cases and developing support software such as our PercolationVisualizer and PercolationProbability classes.
Consider a recursive solution
Recursion is an indispensable tool in modern programming that you should learn to trust. If you are not already convinced of this fact by the simplicity and elegance of Percolation and PercolationPlot, you might wish to try to develop a nonrecursive program for testing whether a system percolates and then reconsider the issue.
Build tools when appropriate
Our visualization method show() and random boolean matrix generation method random() are certainly useful for many other applications, as is the adaptive plotting method of PercolationPlot. Incorporating these methods into appropriate libraries would be simple. It is no more difficult (indeed, perhaps easier) to implement general-purpose methods like these than it would be to implement special-purpose methods for percolation.
Reuse software when possible
Our StdIn, StdRandom, and StdDraw libraries all simplified the process of developing the code in this section, and we were also immediately able to reuse programs such as PercolationVisualizer, PercolationProbability, and PercolationPlot for percolation after developing them for vertical percolation. After you have written a few programs of this kind, you might find yourself developing versions of these programs that you can reuse for other Monte Carlo simulations or other experimental data analysis problems.
The primary purpose of this case study is to convince you that modular programming will take you much further than you could get without it. Although no approach to programming is a panacea, the tools and approach that we have discussed in this section will allow you to attack complex programming tasks that might otherwise be far beyond your reach.
The success of modular programming is only a start. Modern programming systems have a vastly more flexible programming model than the class-as-a-library-of-static-methods model that we have been considering. In the next two chapters, we develop this model, along with many examples that illustrate its utility.