Lists
The list is considered perhaps the most complex data object in R, and many R programmers will go to great lengths to avoid the use of lists in their structures. This perceived complexity, perhaps, stems from a lack of clarity over what a list “looks like.” Other structures, such as vectors and matrices, are relatively easy to visualize, and are therefore easier to adopt and manage.
Despite this, lists are simple structures that can be used to perform a number of complex operations.
What Is a List?
Lists are simply containers for other objects. The objects stored in a list can be of any type (for example, “matrix” or “vector”) and any mode. Therefore, you can create a list containing the following, for example:
- A character vector
- A numeric matrix
- A logical array
- Another list
When discussing lists, some people use the analogy of a box. For example, you might do the following:
- Create an empty box.
- Put some “things” into the box.
- Look into the box to see what things are in there.
- Take things back out of the box.
In a similar way, in this section, we will look at how to do the following:
- Create an empty list.
- Put objects into the list.
- Look at the number (and names) of objects in the list.
- Extract elements from the list.
Creating an Empty List
You create a list using the list function. The simplest list you can create is an empty list, like this:
> emptyList <- list() > emptyList list()
Later, you will see how to add elements to this empty list.
Creating a Non-Empty List
More commonly, you’ll create a list and add initial elements to it at the same time. You achieve this by specifying a comma-separated set of objects within the list function:
> aVector <- c(5, 7, 8, 2, 4, 3, 9, 0, 1, 2) > aMatrix <- matrix( LETTERS[1:6], nrow = 3) > unnamedList <- list(aVector, aMatrix) > unnamedList [[1]] [1] 5 7 8 2 4 3 9 0 1 2 [[2]] [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F"
In this example, we created two objects (aVector and aMatrix) and then created a list (unnamedList) containing copies of these objects.
If you only need the objects within the list, you could create the objects as you specify the list, like this:
> unnamedList <- list(c(5, 7, 8, 2, 4, 3, 9, 0, 1, 2), + matrix( LETTERS[1:6], nrow = 3)) > unnamedList [[1]] [1] 5 7 8 2 4 3 9 0 1 2 [[2]] [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F"
Creating a List with Element Names
When you create a list, you can optionally assign names to the elements. This helps you when you’re referencing elements in the list later.
> namedList <- list(VEC = aVector, MAT = aMatrix) > namedList $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F"
As before, you can also create the (named) objects as you’re creating the list:
> namedList <- list(VEC = c(5, 7, 8, 2, 4, 3, 9, 0, 1, 2), + MAT = matrix( LETTERS[1:6], nrow = 3)) > namedList $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F"
Creating a List: A Summary
You have now seen a few different ways of creating a list. It is worth recapping the ways in which we created the lists with some code examples:
> # Create an empty list > emptyList <- list() > # 2 Ways of Creating an unnamed list containing a vector and a matrix > unnamedList <- list(aVector, aMatrix) > unnamedList <- list(c(5, 7, 8, 2, 4, 3, 9, 0, 1, 2), + matrix( LETTERS[1:6], nrow = 3)) > # 2 Ways of Creating a named list containing a vector and a matrix > namedList <- list(VEC = aVector, MAT = aMatrix) > namedList <- list(VEC = c(5, 7, 8, 2, 4, 3, 9, 0, 1, 2), + MAT = matrix( LETTERS[1:6], nrow = 3))
In these examples, we created three lists that we will use as examples over the next few sections:
> emptyList # An empty list list() > unnamedList # A list with unnamed elements [[1]] [1] 5 7 8 2 4 3 9 0 1 2 [[2]] [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F" > namedList # A list with element names $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F"
List Attributes
As with single-mode structures, a set of functions allows you to query some of the list attributes. Specifically, you can use the length function to query the number of elements in the list, and the names function to return the element names.
The length function returns the number of elements in the list, as shown here:
> length(emptyList) [1] 0 > length(unnamedList) [1] 2 > length(namedList) [1] 2
The names function returns the names of the elements in the list, or NULL if there are no elements or no element names assigned:
> names(emptyList) NULL > names(unnamedList) NULL > names(namedList) [1] "VEC" "MAT"
With single-mode data structures, we additionally used the mode function to return the type of data they held. Because lists are multi-mode structures, there is no longer a single mode of data being stored, so the word “list” is returned:
> mode(emptyList) [1] "list" > mode(unnamedList) [1] "list" > mode(namedList) [1] "list"
Subscripting Lists
Two types of list subscripting can be performed:
- You can create a subset of the list, returning a shorter list.
- You can reference a single element within the list.
Subsetting the List
You can use square brackets to select a subset of an existing list. The return object will itself be a list.
LIST [ Input specifying the subset of list to return ]
As with vectors, you can put one of five input types in the square brackets, as shown in Table 4.1.
TABLE 4.1 Possible List Subscripting Inputs
Input |
Effect |
Blank |
All values of the list are returned. |
A vector of positive integers |
Used as an index of list elements to return. |
A vector of negative integers |
Used as an index of list elements to omit. |
A vector of logical values |
Only corresponding TRUE elements are returned. |
A vector of character values |
Refers to the names of elements to return. |
To illustrate the subsetting of lists, we will use the namedList object created earlier.
Blank Subscripts
If you use a blank subscript, the whole of the list is returned:
> namedList [ ] # Blank subscript $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F"
Positive Integer Subscripts
If you use a vector of positive integers, it is used as an index of elements to return:
> subList <- namedList [ 1 ] # Return first element > subList # Print the new object $VEC [1] 5 7 8 2 4 3 9 0 1 2 > length(subList) # Number of elements in the list [1] 1 > class(subList) # Check the "class" of the object [1] "list"
As you can see from this example, the return object (saved as subList here) is itself a list. You can also use the class function to check the type of object, and it confirms subList is a list object.
Negative Integer Subscripts
You can provide a vector of negative integers to specify the index of list elements to omit:
> namedList $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F" > namedList [ -1 ] # Return all but the first element $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F"
Logical Value Subscripts
You can provide a vector of logical integers to specify the list elements to return and omit:
> namedList $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F" > namedList [ c(T, F) ] # Vector of logical values $VEC [1] 5 7 8 2 4 3 9 0 1 2
Character Value Subscripts
If your list has element names, you can provide a vector of character values to identify the (named) elements you wish to return:
> namedList $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F" > namedList [ "MAT" ] # Vector of Character values $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F"
Reference List Elements
In the last section, you saw that you can reference a list using square brackets to “subset” the list (that is, return a list containing only a subset of the original elements). More commonly, you’ll want to reference a specific element within your list.
You can reference elements of a list in two ways:
- You can use “double” square brackets.
- If there are element names, you can use the $ symbol.
Double Square Bracket Referencing
You can directly reference an element of a list using double square brackets. Although there are a number of uses of the double square brackets, the most common use is to supply a single integer index to refer to the element to extract:
> namedList # The original list $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F" > namedList[[1]] # The first element [1] 5 7 8 2 4 3 9 0 1 2 > namedList[[2]] # The second element [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F" > mode(namedList[[2]]) # The mode of the second element [1] "character"
When you use double square brackets in this way, you are directly referencing the objects contained within the list, as supported by the result of the mode function call. This is in contrast to the use of the single square bracket earlier, where we extracted a subset of the list itself:
> namedList [1] # Return a list containing 1 element $VEC [1] 5 7 8 2 4 3 9 0 1 2 > namedList [[1]] # Return the first element of the list (a vector) [1] 5 7 8 2 4 3 9 0 1 2
Referencing Named Elements with $
If the elements of your list are named, you can use the $ symbol to directly reference them. As such, the following lines of code are equivalent ways of referencing the first (the “VEC”) element of our namedList object:
> namedList # Print the original list $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F" > namedList[[1]] # Return the first element [1] 5 7 8 2 4 3 9 0 1 2 > namedList$VEC # Return the "VEC" element [1] 5 7 8 2 4 3 9 0 1 2
Double Square Brackets versus $
The $ symbol provides a more intuitive way of referencing named list elements, which is also more aesthetically pleasing than the use of double square brackets. We tend to use double square brackets when there are no element names assigned, and use $ when names exist. Here’s an example:
> unnamedList # List with no element names [[1]] [1] 5 7 8 2 4 3 9 0 1 2 [[2]] [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F" > unnamedList[[1]] # First element [1] 5 7 8 2 4 3 9 0 1 2 > namedList # List with element names $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [1,] "A" "D" [2,] "B" "E" [3,] "C" "F" > namedList$VEC # The "VEC" element [1] 5 7 8 2 4 3 9 0 1 2
Adding List Elements
You can add elements to a list in one of two ways:
- By directly adding an element with a specific name or in a specific position
- By combing lists together
Directly Adding a List Element
You can add a single element to a list by assigning it into a specific index or name. The syntax mirrors that of the “Double Square Brackets versus $” section earlier. For example, let’s add a single element to our empty list:
> emptyList # Empty list [[1]] [1] "A" "B" "C" "D" "E" > emptyList[[1]] <- LETTERS[1:5] # Add an element > emptyList # Updated (non)empty list [[1]] [1] "A" "B" "C" "D" "E"
Instead of using the double square brackets, we can use the $ symbol to add a “named” element to a list:
> emptyList <- list() # Recreate the empty list > emptyList # Empty list list() > emptyList$ABC <- LETTERS[1:5] # Add an element > emptyList # Updated (non)empty list $ABC [1] "A" "B" "C" "D" "E"
Combining Lists
You can grow lists by combining them together using the c function, as shown here:
> list1 <- list(A = 1, B = 2) # Create list1 > list2 <- list(C = 3, D = 4) # Create list2 > c(list1, list2) # Combine the lists $A [1] 1 $B [1] 2 $C [1] 3 $D [1] 4
A Summary of List Syntax
As you have seen so far in this hour, the way we use lists varies slightly based on whether the elements of the list are named. At this point, it is worth reviewing the syntax to create and manage “unnamed” and “named” list structures.
Overview of Unnamed Lists
An overview of the key syntax covered is shown here, using a list without named elements as an example. First, let’s create a list and look at the list attributes:
> unnamedList <- list(aVector, aMatrix) # Create the list > unnamedList # Print the list [[1]] [1] 5 7 8 2 4 3 9 0 1 2 [[2]] [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 > length(unnamedList) # Number of elements [1] 2 > names(unnamedList) # No element names NULL
We can subset the list or extract list elements using single/double square brackets:
> unnamedList[1] # Subset the list [[1]] [1] 5 7 8 2 4 3 9 0 1 2 > unnamedList[[1]] # Return the first element [1] 5 7 8 2 4 3 9 0 1 2 > unnamedList[[3]] <- 1:5 # Add a new element > unnamedList [[1]] [1] 5 7 8 2 4 3 9 0 1 2 [[2]] [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 [[3]] [1] 1 2 3 4 5
Overview of Named Lists
Let’s look at a similar example using a list with element names. First, let’s create the list and view the list attributes:
> namedList <- list(VEC = aVector, MAT = aMatrix) # Create the list > namedList # Print the list $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 > length(namedList) # Number of elements [1] 2 > names(namedList) # Element names [1] "VEC" "MAT"
We can subset the list using single square brackets, or reference elements directly with the $ symbol:
> namedList[1] # Subset the list $VEC [1] 5 7 8 2 4 3 9 0 1 2 > namedList$VEC # Return the first element [1] 5 7 8 2 4 3 9 0 1 2 > namedList$NEW <- 1:5 # Add a new element > namedList $VEC [1] 5 7 8 2 4 3 9 0 1 2 $MAT [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 $NEW [1] 1 2 3 4 5
Motivation for Lists
A good understanding of lists helps you to accomplish a number of useful tasks in R. To illustrate this, we will briefly look at two use cases that rely on list structures. Note that this section includes syntax that will be covered later in this book, but we include it here to illustrate “the art of the possible” at this stage.
Flexible Simulation
Consider a situation where we want to simulate a number of extreme values (for example, large financial losses by day, or particularly high values of some measure for each patient in a drug study). For each iteration, we may simulate any number of numeric values from a given distribution.
A list provides a flexible structure to hold all the simulated data. Consider the following code example:
> nExtremes <- rpois(100, 3) # Simulate number of extreme values by day from a Poisson distribution > nExtremes[1:5] # First 5 numbers [1] 0 3 5 7 3 > # Define function that simulates "N" extreme values > exFun <- function(N) round(rweibull(N, shape = 5, scale = 1000)) > extremeValues <- lapply(nExtremes, exFun) # Apply the function to our simulated numbers > extremeValues[1:5] # First 5 simulated outputs [[1]] numeric(0) [[2]] [1] 1305 948 1077 [[3]] [1] 676 516 865 614 970 [[4]] [1] 618 1217 818 1173 1205 1105 519 [[5]] [1] 1026 933 657
From this example, note that the first simulated output generated no “extreme” values, resulting in the output containing an empty numeric vector (signified by numeric(0)). The “unnamed” list structure allows us to hold, in the same structure:
- This empty vector (indicating no “extreme values” for a particular day)
- Large vectors holding a number of simulated outputs (for days where many “extreme values” were simulated)
Given that we have stored this information in a list, we can query it to summarize the average number and average of extreme values:
> median(sapply(extremeValues, length)) # Average number of simulated extremes [1] 3 > median(sapply(extremeValues, sum)) # Average extreme value [1] 2634
Extracting Elements from Named Lists
In R, most objects are, fundamentally, lists. For example, let’s use the t.test function to perform a simple T-test. We will take the example straight from the t.test help file:
> theTest <- t.test(1:10, y = c(7:20)) # Perform a T-Test > theTest # Print the output Welch Two Sample t-test data: 1:10 and c(7:20) t = -5.4349, df = 21.982, p-value = 1.855e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -11.052802 -4.947198 sample estimates: mean of x mean of y 5.5 13.5
The output is printed as a nicely formatted text summary informing us of the significant T-test. But what if we wanted to use one of the elements of this output in further work (for example, the p-value). Consulting the help file, we see the return value is described as follows:
Value
A list with class htest containing the following components:
- statistic The value of the t-statistic.
- parameter The degrees of freedom for the t-statistic.
- p.value The p-value for the test.
- conf.int A confidence interval for the mean appropriate to the specified alternative hypothesis.
- estimate The estimated mean or difference in means, depending on whether it was a one-sample test or a two-sample test.
- null.value The specified hypothesized value of the mean or mean difference, depending on whether it was a one-sample test or a two-sample test.
- alternative A character string describing the alternative hypothesis.
- method A character string indicating what type of t-test was performed.
- data.name A character string giving the name(s) of the data.
The key thing to note here is that the return object is “a list.” Given that the output is a list, we can query the named elements of this list and see that the result matches the description of elements in the help file:
> names(theTest) # Names of list elements [1] "statistic" "parameter" "p.value" "conf.int" "estimate" [6] "null.value" "alternative" "method" "data.name"
Given that this is a named list, and we know the names of the elements, we can use the $ symbol to directly reference the information we need:
> theTest$p.value # Reference the p-value [1] 1.855282e-05
Using this approach, we can reference a wide range of elements from R outputs.