- The Test-First Technique
- Tests as Specifications
- Building Good Specifications
- Summary
Building Good Specifications
You could specify many different kinds of things with tests in any given software development endeavor. You could specify structures, public interfaces or private constructs, or what’s in a class. In database terms, you could specify tables, views, and stored procedures.
A test should specify behavior, but should not specify structure. The more behavior-focused a test is, the better off you will be because structures tend to change a lot more quickly than behaviors. This is true even in the database world where, frankly, the pace of change is nigh unto glacial. If you object to my use of the word quickly, you can think of it this way: Structures change a lot less slowly than behaviors in a database design.
However, tests have to couple to something in order to invoke the behaviors they define. In fact, many structure decisions are involved in making a test pass. The key is to drive those design decisions into a class of databases from the outside, not the other way around.
Specify Behavior, Not Structure
The odds that you are not a software developer are extremely low. My suspicion is that many of the readers of this book are accomplished computer programmers who also do database work and want to learn how to do what they already know how to do in a database domain. You might also be someone who works only or primarily on databases.
The chance also exists that you are an extraterrestrial archeologist sifting through the intellectual ruins of a species long-since turned to dust. If so, I hope I just sent a shiver up whatever your equivalent of a spine is. Also: Hello, and sorry we didn’t survive long enough for our paths to cross—unless you exterminated us, in which case I’m sorry our paths crossed and I hope you caught some horrible disease from us in the process.
Database programmers and application programmers are both still programmers. Both groups are responsible for writing software, which itself is an act of prescribing behaviors and relationships. In the case of object-oriented programming, what those things mean is pretty clear. At least, it is pretty clear now; it might not have been decades ago.
In the case of database development, it’s a little less intuitive what the behaviors being defined are. People often want to think of databases as collections of tables and relationships. The good thing about that is the focus on a database’s primary responsibility: storing stuff. Yet, it’s still a structure-oriented way of considering design.
A table is a bundle of related features tied to a kind of data. The two basic behaviors a table supports are data manipulation and data querying. Other structures carry with them other behaviors and certain platforms offer extra behaviors with various structures.
Those are what you should specify in tests. Don’t specify that there is a table. Specify that you can store and retrieve data of an interesting sort. Don’t specify that there is a view; specify that you can perform useful searches across bodies of data. That decision might seem meaningless now, but as the book proceeds it will become more and more valuable to you.
Drive Design In from Without, Not the Other Way Around
In the procedural days, entities were just data—purely passive things subject to the whims of whatever function might have access to them. With the advent of object-oriented design, they became reactive things that told the world what they could do and then waited for instructions. Modern development practices make classes of objects into servants, told what they should be able to do by tests and then told to do it by other objects and, ultimately, by people.
When you write a test, you want it to specify the behaviors that live in a class of databases, but it’s going to have to talk to something to do that. An implication of specifying a behavior is that you must also specify the minimal amount of public interface required to invoke that behavior. The key to discovering that is to learn it from writing tests first.
Let’s consider a problem. Imagine I need to write an application that keeps a database of streets and cross references them with other intersecting streets. I could drive the requirements from tests, specifying behaviors and required interface, or I could define my design inside out—starting with capabilities, then building an interface around it. I’ll try the latter first.
Defining the Design Inside Out
Well, the obvious thing I need is a table in which to store streets. So let’s start there (see Figure 3.1).
Figure 3.1. Simple design
Of course, streets exist in cities, so I need a cities table. Maybe later I’ll need a states table, too, but for now, I can live without it. Let’s add a cities table with a state column so I can track which street I am dealing with (see Figure 3.2).
Figure 3.2. Streets segregated by city
Some streets span many cities, such as highways and interstate freeways. So I need to account for those, too (see Figure 3.3).
Figure 3.3. A street going through multiple cities
Now there’s the fact that I need to track the intersections, so let’s add that. It seems like it should be a cross-reference table with the address on each street at which the intersection takes place. Because streets sometimes cross in multiple places, I need a primary key that is distinct from the foreign keys on that table so I can support multiple links, as shown in Figure 3.4.
Figure 3.4. Streets organized by city and cross-referenced by intersection
From there, I can start hypothesizing how the data might be used, adding views and stored procedures to support those needs. Then, I could write tests for all the behaviors I developed. Eventually, I’ll think I have enough to start writing an application.
Of course, I won’t.
For one thing, there is a distinct database for every city supported by the application. So, every application is encumbered by adding noise structures. The Cities and StreetToCityLinks tables are completely unnecessary as are the constraints surrounding them.
Also, the application doesn’t care where two streets connect, only that they connect. So, the Street1Address and Street2Address fields of the Intersections table serve no purpose but to waste the time of everyone who touches them or reads about them.
Defining the Design Outside In
What if I try going the other direction? Suppose I want to start at the outside and work my way inward. In that event, by the time I’m defining a database design, I probably would have written the user interface and application logic already.
Having done those things would provide me with context and understanding as to what was really needed. If I work exclusively with the database, then someone else would provide the context for me and I would have a very clear idea of what the requirements are.
Either way, that understanding would be something that could be translated into tests as in the following:
[Test] public void CreateAndFindStreet() { connection.ExecuteSql("INSERT INTO Streets VALUES(5, 'Fun St.')"); var id = connection.ExecuteScalar( "SELECT ID FROM Streets WHERE NAME LIKE '%Fun%'"); Assert.That(id, Is.EqualTo(5)); }
That test would drive me to build a database class as follows:
<Database> <Version Number="1"> <Script> <![CDATA[ CREATE TABLE Streets(ID INT PRIMARY KEY, NAME NVARCHAR(4000)) </Script> </Version> </Database>
Knowing that I also needed the capability to find related streets, I might write another test as follows:
[Test] public void CreateConnectedStreetsAndFindFewestIntersectionsConnected() { connection.ExecuteSql("INSERT INTO Streets VALUES(1, 'A St.')"); connection.ExecuteSql("INSERT INTO Streets VALUES(2, 'B Dr.')"); connection.ExecuteSql("INSERT INTO Streets VALUES(3, 'C Ave.')"); connection.ExecuteSql("INSERT INTO Streets VALUES(4, 'D Ln.')"); connection.ExecuteSql("INSERT INTO Streets VALUES(5, 'E Blvd.')"); connection.ExecuteSql("INSERT INTO Intersections VALUES(1)"); connection.ExecuteSql("INSERT INTO IntersectionStreets VALUES(1, 1)"); connection.ExecuteSql("INSERT INTO IntersectionStreets VALUES(1, 2)"); connection.ExecuteSql("INSERT INTO Intersections VALUES(2)"); connection.ExecuteSql("INSERT INTO IntersectionStreets VALUES(2, 1)"); connection.ExecuteSql("INSERT INTO IntersectionStreets VALUES(2, 3)"); connection.ExecuteSql("INSERT INTO Intersections VALUES(3)"); connection.ExecuteSql("INSERT INTO IntersectionStreets VALUES(3, 3)"); connection.ExecuteSql("INSERT INTO IntersectionStreets VALUES(3, 4)"); var result = connection.ExecuteScalar( "SELECT Depth FROM Connections() WHERE StartID = 2 AND EndID = 4"); Assert.That(result, Is.EqualTo(3)); }
That test would drive me to develop the design in the next snippet:
<Database> <Version Number="1"> <Script> <![CDATA[ CREATE TABLE Streets(ID INT PRIMARY KEY, NAME NVARCHAR(4000)) CREATE TABLE Intersections([ID] INT PRIMARY KEY) CREATE TABLE IntersectionStreets( [IntersectionID] INT FOREIGN KEY REFERENCES Intersections(ID), [StreetID] INT FOREIGN KEY REFERENCES Streets(ID)) ]]> </Script> <Script> <![CDATA[ CREATE VIEW ImmediateConnections AS SELECT s.StreetID AS StartID, e.StreetID AS EndID FROM IntersectionStreets AS s INNER JOIN IntersectionStreets AS e ON s.IntersectionID = e.IntersectionID and s.StreetID <> e.StreetID ]]> </Script> <Script> <![CDATA[ CREATE FUNCTION Connections ( ) RETURNS @Result TABLE (Depth INT, StartID INT, EndID INT) AS BEGIN DECLARE @Temp TABLE (StartID INT, EndID INT) DECLARE @Depth INT SET @Depth = 0 INSERT INTO @Temp SELECT ID AS StartID, ID AS EndID FROM Streets; WHILE EXISTS (SELECT TOP 1 * FROM @Temp) BEGIN INSERT INTO @Result SELECT @Depth, StartID, EndID FROM @Temp; DELETE @Temp; INSERT INTO @Temp SELECT r.StartID, ic.EndID FROM @Result AS r INNER JOIN ImmediateConnections AS ic ON r.EndID = ic.StartID DELETE @Temp FROM @Temp AS tc INNER JOIN @Result AS r ON tc.StartID = r.StartID AND tc.EndID = r.EndID SET @Depth = @Depth + 1 END; RETURN END ]]> </Script> </Version> </Database>
Note how narrow and focused the interface for the database that was designed outside-in is compared to the one that was designed inside-out. Yet, in certain areas such as the recursive view, the behavior is much deeper than with the inside-out design. The two side-effects of driving design into a system rather than designing a system and making clients find a way to use it are that you write something that can actually be used, and you spend more of your time developing worthwhile functionality.