Transforming the Data
Now that we have our data in place we can start asking questions of it and working through the matrix transformation.
Fetching the Data
Looking at all the departures for all of 1999 would not be very interesting. Carriers have different routes and different hubs, and there are a LOT of flights. The chord diagram would be a mess. Looking at departures for a given airline makes more sense.
Let’s take a look at the origin/destination city pairs for American Airlines. The IATA code (abbreviation) for American Airlines is ‘AA’, Delta is ‘DL’, Southwest Airlines is ‘WN’, and so on. As a fun exercise later on, go back through and generate the chord diagrams for the other carriers in the data.
The following class method goes in the Departure class and begins the process of creating the matrix we need to feed the chord diagram:
def self.departure_matrix sql = <<-SQL.strip_heredoc SELECT origin, dest, count(*) FROM departures WHERE unique_carrier = 'AA' GROUP BY 1, 2 ORDER BY 1, 2 SQL counts = connection.execute(sql) end
This query just gives us the counts. It does not generate a matrix for us. We need to take those counts and turn them into a matrix.
Generating the Matrix
Fortunately for us, Ruby has a Matrix class in the standard library. I’m not going to write this code directly in the Departure class, though. Create a module called DepartureMatrix (app/models/departure_matrix.rb).
require 'matrix' module DepartureMatrix def airports_matrix!(counts:) h_matrix = counts.each_with_object({}) do |record, hash| hash[record["origin"]] ||= Hash.new(0) hash[record["origin"]][record["dest"]] = Integer(record["count"]) end airports = h_matrix.keys.sort total = Float(h_matrix.values.flat_map(&:values).sum) matrix = Matrix.build(airports.count) do |row, column| origin = airports[row] dest = airports[column] h_matrix.fetch(origin, {}).fetch(dest, 0) / total end [airports, matrix] end end
The airports_matrix! method takes in a single parameter counts, which is what we just generated and returns a tuple (array with 2 items) with the list of airports and the matrix. We need both of these to tell D3 how to draw the chords and the labels.
There is a lot happening in this method, so let’s walk through it.
h_matrix = counts.each_with_object({}) do |record, hash| hash[record["origin"]] ||= Hash.new(0) hash[record["origin"]][record["dest"]] = Integer(record["count"]) end
Enumerable is pretty amazing. Let’s take a closer look at Enumerable#each_with_object. Here is a simplified example from the documentation:
evens = (1..10).each_with_object([]) { |item, array| array << item * 2 } #=> [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
The each_with_object iterator takes a block that has two parameters. The first parameter is the item as we iterate through the collection. The second parameter is the object that we define in the parenthesis. In this case it is an array. In my code it is a hash.
The other tricky thing happening here is that I set a default value for each new origin hash. If there are missing values, they will be represented by the default value—zero in this case. The next thing we do is get a sorted list of airports. We can ask a hash for its keys, and what we get back is an array of keys. Pretty cool!
airports = h_matrix.keys.sort
Next we need to calculate the grand total of all the things. This is how we will know what percent each individual count represents. We just asked the hash for its keys, and now we are asking for its values.
total = Float(h_matrix.values.flat_map(&:values).sum)
We have a multi-dimensional hash, so we get an array of arrays. We could map over those and flatten the resulting array, but flat_map gives us a nice shortcut for that combination of actions.
Do you remember significant digits from school? We need the total to be a float so that we don’t lose the decimal point for downstream calculations.
matrix = Matrix.build(airports.count) do |row, column| origin = airports[row] dest = airports[column] h_matrix.fetch(origin, {}).fetch(dest, 0) / total end
The final thing we need to do is actually generate the matrix. Here is a simple example of Matrix#build from the documentation:
m = Matrix.build(2, 4) {|row, col| col - row } #=> Matrix[[0, 1, 2, 3], [-1, 0, 1, 2]]
The example builds a matrix with two rows and four columns. Matrix#build takes up to two parameters for the row and column counts. If you omit the second parameter, the column count will be set to the row count. I rely on that behavior in my code to generate a square matrix.
Finalizing the Matrix
Now that we have our matrix generator we need to put it to work. Go back to the Departure class and have it extend DepartureMatrix. Now DepartureMatrix#airports_matrix! becomes a class method in Departure. We just need to call it, like so:
def self.departure_matrix sql = <<-SQL.strip_heredoc SELECT origin, dest, count(*) FROM departures WHERE unique_carrier = 'AA' GROUP BY 1, 2 ORDER BY 1, 2 SQL counts = connection.execute(sql) airports_matrix!(:counts => counts) end
You can run this from the console to see what your matrix looks like. It should be a large array of arrays.