Understanding quap
R’s quap function occupies a position between the simpler (but often awkward) grid approximation and the more sophisticated (but often murky) MCMC. We might as well begin with the function’s name: quap is an abbreviation of quadratic approximation. (The functionality is also referred to as Laplace approximation.)
Behind the scenes, the software makes an approximation of the posterior distribution density (the product of the priors and the observations) of the parameter we want to know about; for example, a regression coefficient in a multiple regression equation. To do so, the software uses a quadratic function; hence the term quadratic approximation.
The quap function is capable of returning a variety of analyses that support Bayesian methods. However, its principal purpose is to build a posterior distribution from samples that conform to requirements that you supply. These often include the location and spread of Gaussian distributions from which priors are assembled. Another purpose that the quap function serves is to define the relationships among the variables in your analysis.
Let’s take a look at how those processes might support a quap function that supports the Bayesian version of simple (that is, single-predictor) regression analysis. We start with a little housekeeping:
library(rethinking) setwd("C:/Users/Smith/Documents") PropTaxes <- read.csv("Assessments.csv")
The quap function is part of the rethinking package, so begin by loading rethinking. You’ll need to install rethinking first, if you haven’t done so already.
You don’t need to set the working directory by means of the setwd function if your data file is already stored there; otherwise, use setwd to point R in the right direction, or copy the file into the current working directory.
The third line of R code above assumes that your data is in a csv file named Assessments.csv, so read the data into R’s workspace from that file and assign it the name PropTaxes. Keep in mind that the read.csv function results in a data frame, so you now have a data frame named PropTaxes. (Don’t forget that names in R, including file names, are case sensitive.)
Here’s the next line in the R code:
MeanValue <- mean(PropTaxes$Value)
This statement establishes a new variable named MeanValue from the variable named Value. It is the arithmetic mean of the variable for all the cases in the PropTaxes data frame. The code goes on to subtract that mean value from each observed value, changing the nature of the variable from a raw observation to a mean-corrected value. At that point, MeanValue is no longer an assessment measured in dollars but a deviation from the mean assessment, measured in dollars. There are some good analytic reasons to shift the meaning of the Value variable in this way, but the principal purpose here is to clarify the meaning of the resulting regression coefficient.
We’ll take a look at that shortly. In the meantime, notice that when the setwd function creates the new data frame from the Assessments.csv data file, it attaches the Value field as a variable. You can address that variable directly by providing the data frame’s name, followed by the dollar sign $, followed by the variable’s name. For example:
PropTaxes$Value
Housekeeping’s over, and now it’s time to build the model for the analysis. The first step is to name the model, which here will have the name AssessModel. The specifications that follow the function name in the code will be used to structure AssessModel. Those specifications are assigned to the model by means of the assignment operator, which in R is indicated by the less-than symbol followed by a dash: <-. (Sometimes, although rarely, the equal sign is used instead of <-.)
AssessModel <- quap( alist( . . .
Here, the result returned by the quap function is saved to a new object (a model) named AssessModel. The model is made in the form of a list created by the alist function. The elements that belong to the list are formulas and as such might include references to variables and parameters that aren’t yet ready for use. For example:
Tax ~ dnorm( mu , sigma ) ,
This is a model formula, and it can be used as a component of the list assembled by the alist function. A list created by alist has some important differences from a list that results from the c or the list function; for example, elements of the list are not necessarily evaluated immediately. In the prior example, the value of Tax can be read as dependent on the purpose of the dnorm function and the parameters mu and sigma. If we don’t yet know what values to use for mu and sigma, we can’t yet evaluate dnorm or its results. But no worries: we’ll get around to evaluating them shortly.
So, that’s the first component of the list. Here’s where we left off:
Tax ~ dnorm( mu , sigma ) ,
That tilde operator is used frequently in quap formulas, and its effect can depend on the context. Here, it means roughly that Tax will be distributed as a normal curve with mu and sigma as its parameters. In English, Tax depends on the result returned by dnorm when it gets mu and sigma as its arguments.
As I just noted, the code doesn’t have those values yet. While waiting for them, let’s get a handle on dnorm. That’s an abbreviation of density normal. It tells R to look in the normal curve and return the density (in this context, density means probability) when values have been assigned to both mu and sigma.