Building a Better Build System: An Interview with Peter Smith
Fred Brooks made the claim that no single activity takes more than perhaps a fourth of development time, so there will be "No Silver Bullets" to reduce development costs by 90%. Brooks went on to suggest that instead of "Silver Bullets" we should be looking for a collection of "Bronze Bullets."
One such bronze bullet is optimizing the build process.
As a former developer myself, when I hear that build processes can take up to ten percent of the effort, that certainly resonates with me: Build times are often too high, and build systems are too complex and too painful to change. If you imagine decreasing build costs by a factor of ten and using the saved time to actually develop software, the result might just be significant sustained competitive advantage. (And it certainly qualifies as one of Brook's "Bronze Bullets.")
When it comes to research on build methods, Smith is no slouch. He approaches the subject with the rigor you would expect of a PhD-credentialed computer scientist, coupled with a realism that comes from spending the last twelve years working in industry as a technologist and do-er.
With the potential for a ten percent gain in productivity, and, for that matter, an offer on the table to take the pain away from the build system, well ... let's just say I was interested in what he had to say.
I interview Peter about his life experience, how he got involved in build system work, and what you can do right now to improve build system performance.
Matt Heusser: Tell us a bit more about your background, Peter. What did you study in graduate school and what turned you on to build systems?
Peter Smith: Like many kids in the 1980s, I started by programming everything in the BASIC language, but soon got hooked on using assembly language to do more powerful things. When I got to university, I learned about compilers and was fascinated by how you could write code in a high-level language and have it automatically translated to machine code. This led to my graduate school research in operating systems, compilers and language design.
My experience with build systems started when I moved to industry and became the manager of a tools and release engineering team. As a result, I was responsible for a Makefile-based build system and also had involvement with an Ant-based build. Fast forward ten years later, where I took on a contract job to maintain that same build system. The code base was ten years older and very much larger, and I could easily see how the build system was straining under pressure. That got me thinking about how things could be done better.
Matt: Before I knew about your book, I always thought of a build system as sort of "the cost of doing business." I have to admit, though, the experience of the painful, creaky build or the build-that-slows-down-development is all too familiar to me. Tell us how we get in this situation.
Peter: Actually, I still think a build system is part of the cost of doing business, but as with all business decisions you need to justify putting effort into it. It's easy (and cheap) to create a build system when your software is small, but when the software grows over time, with new features and new developers joining the project, the "dog house" build system you first created won't automatically turn into a "skyscraper" (to steal a great analogy).
I think we've all seen this with software, where the way you initially design the product is okay for a while, but starts to break down when things get big, complicated, and full of features. If you don't spend time refactoring or rearchitecting the code, it'll be a constant battle to get work done. The same is true for a build system.
Matt: Can you describe in more detail what that break down looks like? What goes wrong when the build system isn't working well?
Peter: In an ideal world, a build system would provide a single button that you push to compile your program. In reality it has to do a bunch of work to figure out which source files have changed, in which order they should be compiled, and which command-line options should used. If any of this information is wrong, there's a good chance the build will "break" or will result in an invalid software image.
Most people have experienced build systems that don't fully compile all the source code you've changed. This forces you to manually remove object files or to change unrelated source files, just to get everything to compile correctly. In other cases, you've probably seen a build system that insists on compiling files you haven't changed, which makes everything take longer than it should. A third situation is where your build system constantly spits out compiler errors that you can't seem to fix no matter how much you tweak your source code. All of these problems distract you from getting your job done.
Matt: You suggest a different way -- that build systems can be a strategic enabler for a team. Tell us what you mean by that, and how a team could get there.
Peter: I use the 10% rule of thumb a lot these days to justify why people should improve their build system. I ask a software developer how much time they spend waiting for builds to complete, dealing with broken builds that weren't their fault, or trying to figure out why a source file isn't recompiling. Developers don't always have a good way to estimate the time they waste, but most people agree that it's at least 10% of their working hours, and sometimes much higher. Next, ask a company executive how much they'd pay to have their developers be 10% more productive and to produce 10% more features. A good leader will understand the problems and commit the time to fix them.
Matt: Does anyone ever say "but my system is different"? I mean, compare a small interpreted project, like Ruby or Perl, to a larger java application to something huge like a database or operating system. Those will have different dynamics, right? Tell us about how the problems of a build scale (or fail to scale) up with the software.
Peter: If you're building a small piece of software, your code will take less than a minute to compile or package, and you'll hardly waste any time with build system problems. When you do encounter problems, you only have a small number of people impacted, and it doesn't take long to resolve the issue. However, there are still of lot of people working in the large-scale software world where having millions of lines of code is very common. Whether it be an embedded system written in C or a GUI-based application in Java or C#, the complaints about poor quality build systems are fairly universal.
Matt: Is there some inflection point where we should start “paying attention” to the build? A thousand lines of code? Ten thousand? A hundred thousand? I realize lines of code is a terrible metric for lots of reasons, so do you have any rules of thumb?
Peter: The best approach is to listen to developers and hear their complaints. You may have a ten thousand line project where developers are constantly doing "clean" builds because source files weren't recompiled when they should have been. At the other end of the scale, you may have a ten million line project that's a joy to use because it's divided into manageable sub-components and using a parallel build system to speed things up. I'd recommend the approach of fixing things when you first notice they're broken. In contrast though, there's no point in over-engineering a solution by spending months redesigning a build system that people are already happy with.
Matt: Let's say we've reached that point. Builds have started to become painful. We on the technical staff recognize there is some opportunity here to make our lives better. Where do we start?
Peter: You should start by understanding the root of the problem. Are the complaints about slowness, or about correctness of what's being compiled, or is it just too hard for developers to understand how to add new source files, libraries or executable programs? Each of these is a separate problem with its own set of solutions.
The next step is to dig deeper and understand why things are failing. For example, if you're using build machines that are more than 3 years old, your best starting point is to buy new machines. You can either spend $1000 on a new build machine for each developer, or pay them $5,000 a year for sitting and watching their software compile. If this doesn't fix the problem, you then need to dig even further to find the solution. For example, even though you have new build machines, it's a bad sign if you still have source files that take two seconds each to compile!
Matt: Talk to us about architecture -- or maybe “patterns.” Have you found there are patterns of build systems that work better than others?
Peter: I can't say I've focused too much on patterns in the traditional sense, but I'm certainly a big advocate of separating user-visible build description from internal implementation. Too many people clutter their build description (such as Makefiles) with complexity that should be hidden away from view. It's important to have a clear separation between what the software developer needs to see and how the build system actually does the work. That is, the developer cares about the list of source files, the compiler flags, libraries and executable programs. They don't (or shouldn't) care about how the build tool figures out which commands to invoke, in which order, and what the inter-file dependencies are.
This is why I like build tools such as SCons and CMake since all you need to do is list the source files that go into your program, and the build tool handles the logistics. This is only true for Make-based systems when you're dealing with very simple programs.
Matt: You mentioned SCons and CMake. Can you tell us about the kind of build architectures you are familiar with? I mean, between Ant, Maven, Hudson, Make, CMake, NMake, DotMake, and Make#, it can can pretty confusing. (I just made those last two up, but they might be real things, give it a couple years.) Where should people start?
Peter: Every build tool has its own strengths and weaknesses, so your choice of tool (or of architecture) really depends on how much simplicity you want versus how much flexibility you need. For example, Make and NMake require you to specify the exact dependencies between input files and output files, which gives you a lot of flexibility but also a lot of headaches when you're doing complex builds. At the next level, tools like Ant, SCons and CMake allow you to say "take these source files and combine them into this executable program." This makes it much simpler to write a build description, although only if you sacrifice a bit of flexibility. At the far end of the scale, Maven makes it really easy to create a build system, as long as you conform to a standard way of building software.
Hudson is in quite a different category, which I refer to as "Build Management" tools. These don't know anything about individual source files or libraries, but instead they're responsible for checking out code from the version-control tool, building it by invoking some other build tool (such as Make or Ant), running automated tests, and notifying end users whether the build result is good or not. You should use a build management tool in conjunction with a traditional build tool such as Make, SCons or Ant.
Matt: Earlier you mentioned listening to the developers about pain points. If I may, I have two questions on that. First off, who has the “onus” for that -- who owns solving the problem? Shouldn't the developers “just” see the problem and fix it?
Peter: That would be a nice situation, but it really depends on two factors. First, does the developer have the skills to fix the problem? We've all been in situations where we had to modify somebody else's code without really understanding what it does. Even if we think we know, it's still possible to make changes that appear to work, but actually have negative impacts. Build systems often suffer from this problem since developers aren't always experienced in writing build descriptions (such as Makefiles) and can sometimes make the problems worse. This is why I'm an advocate of build systems that developers can easily understand and modify, rather than using cryptic and complex tools.
The second factor is focused on the development organization itself, and what the management will let people do. Even if a developer is willing to fix the build system, their manager may instead pressure them to add new product features, or fix an important customer bug. One of my favorite expressions is "Where the focus goes, the energy flows". If management is focused too much on customer needs, then there'll be no time or energy put into fixing build system problems. This issue is all too familiar.
Matt: Second follow-up on pain points. I see where you're coming from in suggesting a sense-and-respond approach. If an enlightened team wanted to prevent painful builds, what would you recommend they do?
Peter: That's exactly why I wrote my book. I wanted to encapsulate the good and bad experiences I've had over the last few years so that people could make wise decisions, rather than learning by trial and error. There are obvious rules such as ensuring your dependencies are correct, and less obvious things such as version-controlling all references to your compilers and not allowing your IT group to arbitrarily upgrade your build machines. Just avoiding these pitfalls can save you a lot of time. I also talk about making a component-based build system, as well as various techniques for minimizing the number of files that are recompiled during an incremental build.
Matt: What's the single biggest mistake you've seen made about software build systems, and what do you recommend teams do instead in order to avoid that mistake?
Peter: That's a tough question, since there's such a wide range of build systems and potential problems. From what I've seen, the most general problem is that a team designs the build system when the software is small, but doesn't revisit the design as the software grows. Instead, they apply band-aid after band-aid to make their initial build system scale, and consequently make it very complex and error prone. I'd instead recommend a regular amount of maintenance on the build system to make it more scalable. This might even involve a complete rewrite to use a different build tool.
Matt: What's the future of build tools? What should teams be looking at today, and what should they be looking for tomorrow?
Peter: The trend in build tools is to move away from the traditional Make-based builds towards a more task-based model. Expecting a developer to specify the dependencies between source and object files is simply asking for problems. If you're starting a new build system, I highly recommend you try newer tools, such as Ant, SCons or CMake that focus on "what should be built", rather than "how it should be built".
In the longer-term, I think we'll start seeing graphical build tools that are similar in nature to GUI builders. If you want to create a new library, you simply click on the toolbar and a new library icon appears on your screen. To add a file into that library, simply drag and drop the file's icon onto the library's icon. Naturally you'd still need a scripting language to handle the complex parts of the build process, but the bulk of the build can be described graphically.
Matt: Thank you for your time, Peter. Where can our readers go to learn more about you and about improving the build?
Peter: I've encapsulated most of my build system experience into my book, and I'd recommend that as a starting point. Once you've chosen the set of tools you'll use in your build system, I highly recommend you read one or more additional books that focus on the syntax and semantics of that tool. Having read my book first, I hope you'll have an appreciation for the right and wrong ways to create a build system using your chosen build tool.