Best Practice Ruby on Rails Refactoring: Databases
With the Rails framework providing a simple ORM that abstracts many of the database details away from the developer, the database is an afterthought for many Rails developers. While the power of the framework has made this okay to a certain extent, there are important database and Rails-specific considerations that you shouldn’t overlook.
AntiPattern: Messy Migrations
Ruby on Rails database migrations were an innovative solution to a real problem faced by developers: How to script changes to the database so that they could be reliably replicated by the rest of the team on their development machines as well as deployed to the production servers at the appropriate time. Before Rails and its baked-in solution, developers often wrote ad hoc database change scripts by hand, if they used them at all.
However, as with most other improvements, database migrations are not without pain points. Over time, a database migration can become a tangle of code that can be intimidating to work with rather than the joy it should be. By strictly keeping in mind the following solutions, you can overcome these obstacles and ensure that your migrations never become irreconcilably messy.
Solution: Never Modify the up Method on a Committed Migration
Database migrations enable you to reliably distribute database changes to other members of your team and to ensure that the proper changes are made on your server during deployment.
If you commit a new migration to your source code repository, unless there are irreversible bugs in the migration itself, you should follow the practice of never modifying that migration. A migration that has already been run on another team member’s computer or the server will never automatically be run again. In order to run it again, a developer must go through an orchestrated dance of backing the migration down and then up again. It gets even worse if other migrations have since been committed, as that could potentially cause data loss.
Yes, if you’re certain that a migration hasn’t been run on the server, then it’s possible to communicate to the rest of the team that you’ve changed a migration and have them re-migrate their database or make the required changes manually. However, that’s not an effective use of their time, it creates headaches, and it’s error prone. It’s simply best to avoid the situation altogether and never modify the up method of a migration.
Of course, there will be times when you’ve accidentally committed a migration that has an irreversible bug in it that must be fixed. In such circumstances, you’ll have no choice but to modify the migration to fix the bug. Ideally, the times when this happen are few and far between. In order to reduce the chances of this happening, you should always be sure to run the migration and inspect the results to ensure accuracy before committing the migration to your source code repository. However, you shouldn’t limit yourself to simply running the migration. Instead, you should run the migration and then run the down of the migration and rerun the up. Rails provides rake tasks for doing this:
rake db:migrate rake db:migrate:redo
The rake db:migrate:redo command runs the down method on the last migration and then reruns the up method on that migration. This ensures that the entire migration runs in both directions and is repeatable, without error. Once you’ve run this and double-checked the results, you can commit your new migration to the repository with confidence.
Solution: Never Use External Code in a Migration
Database migrations are used to manage database change. When the structure of a database changes, very often the data in the database needs to change as well. When this happens, it’s fairly common to want to use models inside the migration itself, as in the following example:
class AddJobsCountToUser < ActiveRecord::Migration def self.up add_column :users, :jobs_count, :integer, :default => 0 Users.all.each do |user| user.jobs_count = user.jobs.size user.save end end def self.down remove_column :users, :jobs_count end end
In this migration above, you’re adding a counter cache column to the users table, and this column will store the number of jobs each user has posted. In this migration, you’re actually using the User model to find all users and update the column of each one. There are two problems with this approach.
First, this approach performs horribly. The code above loads all the users into memory and then for each user, one at a time, it finds out how many jobs each has and updates its count column.
Second, and more importantly, this migration does not run if the model is ever removed from the application, becomes unavailable, or changes in some way that makes the code in this migration no longer valid. The code in migrations is supposed to be able to be run to manage change in the database, in sequence, at any time. When external code is used in a migration, it ties the migration code to code that is not bound by these same rules and can result in an unrunnable migration.
Therefore, it’s always best to use straight SQL whenever possible in your migrations. If you do so, you can rewrite the preceding migration as follows:
class AddJobsCountToUser < ActiveRecord::Migration def self.up add_column :users, :jobs_count, :integer, :default => 0 update(<<-SQL) UPDATE users SET jobs_count = ( SELECT count(*) FROM jobs WHERE jobs.user_id = users.id ) SQL end def self.down remove_column :users, :jobs_count end end
When this migration is rewritten using SQL directly, it has no external dependencies beyond the exact state of the database at the time the migration should be executed.
There may be cases in which you actually do need to use a model or other Ruby code in a migration. In such cases, the goal is to rely on no external code in your migration. Therefore, all code that’s needed, including the model, should be defined inside the migration itself. For example, if you really want to use the User model in the preceding migration, you rewrite it like the following:
class AddJobsCountToUser < ActiveRecord::Migration class Job < ActiveRecord::Base end class User < ActiveRecord::Base has_many :jobs end def self.up add_column :users, :jobs_count, :integer, :default => 0 User.reset_column_information Users.all.each do |user| user.jobs_count = user.jobs.size user.save end end def self.down remove_column :users, :jobs_count end end
Since this migration defines both the Job and User models, it no longer depends on an external definition of those models being in place. It also defines the has_many relationship between them and therefore defines everything it needs to run successfully. In addition, note the call to User.reset_column_information in the self.up method. When models are defined, Active Record reads the current database schema. If your migration changes that schema, calling the reset_column_information method causes Active Record to re-inspect the columns in the database.
You can use this same technique if you must calculate the value of a column by using an algorithm defined in your application. You cannot rely on the definition of that algorithm to be the same or even be present when the migration is run. Therefore, the algorithm should be duplicated inside the migration itself.
Solution: Always Provide a down Method in Migrations
It’s very important that a migration have a reliable self.down defined that actually reverses the migration. You never know when something is going to be rolled back. It’s truly bad practice to not have this defined or to have it defined incorrectly.
Some migrations simply cannot be fully reversed. This is most often the case for migrations that change data in a destructive manner. If this is the case for a migration for which you’re writing the down method, you should do the best reversal you can do. If you are in a situation where there is a migration that under no circumstances can ever be reversed safely, you should raise an ActiveRecord::IrreversibleMigration exception, as shown here:
def self.down raise ActiveRecord::IrreversibleMigration end
Raising this exception causes migrations to be stopped when this down method is run. This ensures that the developer running the migrations understands that there is something irreversible that has been done and that cannot be undone without manual intervention.
Once you have the down method defined, you should run the migration in both directions to ensure proper functionality. As discussed earlier in this chapter, in the section “Solution: Never Modify the up Method on a Committed Migration,” Rails provides rake tasks for doing this:
rake db:migrate rake db:migrate:redo
The rake db:migrate:redo command runs the down method on the last migration and then reruns the up method on that migration.