Disjointed City Pairs
Airlines are very good at resource optimization. An airplane goes from airport to airport, picking up and delivering passengers at each stop. I wondered, though, how often does an airplane fly an “empty leg” route? By that I mean, how often is the destination airport different from the next origination airport for a given airplane? We have the data that we need to answer that question!
Using the Lead Window Function to Find Empty Leg Flights
The idea is simple enough. We have the airplanes’ tail numbers, so we can track each plane’s movements. We can ask Postgres to tell us the next origin airport using the LEAD window function and compare that to that record’s destination airport. The query to do all of those things requires a few steps, though. I typically use Common Table Expression (CTE) when I have queries with multiple parts that build on each other.
If you wanted to just write out the query and see the data, it would look like this:
WITH base_query AS ( SELECT id, unique_carrier, flight_num, tail_num, origin, dest, to_date((((year || '-'::text) || month) || '-'::text) || day_of_month, 'YYYY-MM-DD'::text) AS dep_date, dep_time, arr_time, actual_elapsed_time, dep_delay, arr_delay, diverted FROM departures WHERE unique_carrier = 'AA' ), departures_with_lead AS ( SELECT *, LEAD(origin) OVER (PARTITION BY tail_num ORDER BY tail_num, dep_date, dep_time) AS next_origin FROM base_query ) SELECT * FROM departures_with_lead WHERE unique_carrier = 'AA' AND dest <> next_origin;
Go ahead and open your favorite SQL editor, or go into rails dbconsole, and run that query.
It’s not crazy slow, but it’s also not fast. For that reason, I will optimize and put the base query in a materialized view.
Optimizing Slow Queries with the Materialized View
A view is just a query that is stored in the database. A materialized view is where the query AND the results are stored in the database. You can index and update the data in a materialized view. We won’t need to update this data, but indexes will be handy.
We can use Rails’s migrations to do this for us. Create a migration (rails g migration create_aa_departures). Put the following code in the new migration and run the migration:
class CreateAaDepartures < ActiveRecord::Migration def up sql = <<-SQL.strip_heredoc CREATE MATERIALIZED VIEW aa_departures AS ( WITH base_query AS ( SELECT id, unique_carrier, flight_num, tail_num, origin, dest, to_date((((year || '-'::text) || month) || '-'::text) || day_of_month, 'YYYY-MM-DD'::text) AS dep_date, dep_time, arr_time, actual_elapsed_time, dep_delay, arr_delay, diverted FROM departures WHERE unique_carrier = 'AA' ) SELECT *, LEAD(origin) OVER (PARTITION BY tail_num ORDER BY tail_num, dep_date, dep_time) AS next_origin FROM base_query ); CREATE INDEX ON aa_departures USING btree (origin); CREATE INDEX ON aa_departures USING btree (dest); CREATE INDEX ON aa_departures USING btree (next_origin); SQL end def down execute("DROP MATERIALIZED VIEW aa_departures;") end end
Draw the Disjointed City Pairs Chord Diagram
The query was the hard part. Now we just need to add the plumbing in the DeparturesController to query the data and serve it to the JavaScript.
The new controller methods should look very familiar:
def disjointed; end def disjointed_matrix airports, matrix = Departure.disjointed_matrix render :json => { :airports => airports, :matrix => matrix } end
Don’t forget to add routes for the disjointed and disjointed_matrix actions. The view app/views/departures/disjointed.html.erb is simply a copy of index.html.erb with the route changed for the data call:
makeChordChart('/departures/disjointed_matrix.json');
Now for the Departure model. You’ve seen the full query, and you saw how we broke it up to create the materialized view. We put the pieces back together in Departure#disjointed_matrix.
def self.disjointed_matrix sql = <<-SQL.strip_heredoc SELECT dest, next_origin, count(*) FROM aa_departures WHERE dest <> next_origin AND next_origin IS NOT NULL GROUP BY 1, 2 ORDER BY 1, 2 SQL counts = connection.execute(sql) airports_matrix!(:counts => counts, :field1 => "dest", :field2 => "next_origin") end
You may have noticed that we now have three parameters in the DepartureMatrix#airports_matrix! call. We need to update that method. The modified code is below with the updated lines in bold.
def airports_matrix!(counts:, field1: "origin", field2: "dest") h_matrix = counts.each_with_object({}) do |record, hash| hash[record[field1]] ||= Hash.new(0) hash[record[field1]][record[field2]] = Integer(record["count"]) end airports = h_matrix.keys.sort total = Float(h_matrix.values.flat_map(&:values).sum) matrix = Matrix.build(airports.count) do |row, column| origin = airports[row] dest = airports[column] h_matrix.fetch(origin, {}).fetch(dest, 0) / total end [airports, matrix] end
The field names in the queries are different, but the matrix creation routine is the same. We can simply pass in the data’s hash key attributes, and everything still works.
You’ll need to restart the Rails app if you still had it running because you’ve added new files. Go to http://localhost:3000/departures/disjointed and you should see a chord chart that looks like Figure 6.3.
Figure 6.3 Disjointed City Pair Chord Chart