Item 14: Use Patch
The patch program is considered by some to be the prime enabler behind the success of open source software. To quote Eric Raymond, "The patch program did more than any other single tool to enable collaborative development over the Internet—a method that would revitalize UNIX after 1990" (The Art of UNIX Programming, Addison-Wesley, 2003). Of course, it is hard to imagine patch taking all the credit; after all, what would development be without vi(1)? But still, there is a ring of truth in what he says.
In the open source community, at any given moment, on any given project, there are dozens, if not hundreds of developers, all working on some derivation of what is in currently on the tip (or branch) of some source code repository. All of them are working relatively blind to the changes that their counterparts are making to the same body of source code.
An Example
Integrating (and evaluating) the changes made to a shared body of source code in such an environment can be difficult and error prone without a tool like patch. To see how, consider a team of three developers (A, B, and C) all working from the tip of the repository. Developer B is the team lead, and his job is to perform code reviews for Developer A and C, and integrate their changes into the source tree once an acceptable code review has been obtained. He also does development on the same body of source code, because he owns the overall architecture.
Let's say that Developer A finishes his work and is in need of a code review. To obtain the code review, Developer B needs to communicate his changes to Developer B. I've seen this done a few different ways over the years:
- Developer A copies and pastes the changes made to the source file(s) into a text file, and sends the result to Developer B. In addition, Developer A adds comments to the text file to describe what the changes are, and where in the original source file the changes were made (or Developer A e-mails this information separately to Developer B). This is perhaps the worst method of all for conducting a code review, for two reasons:
- Developer A may make a mistake and not copy and paste all the changes that were made, or miss entire source files that have modification. The omission of a single line of change can greatly affect the ability of a code reviewer to accurately perform his task. Worse yet, if the code reviewer is responsible for integrating the changes into the repository and changes were missed, the process will surely lead to bugs.
- Even if all changes are copied and pasted by Developer A, there is a chance that context will be lost or incorrectly communicated. One way to counter this problem would be for Developer A to include extra lines above and below the code that actually changed, but this is a better job for a tool like cvs diff, which can generate a patch file that contains the needed lines of context.
- Developer A sends to Developer B copies of all the source files that were changed. This is better than sending a series of hand-constructed diffs, because Developer B can now take the source files and create a patch that correctly captures the changes made by Developer A, along with the context of those changes. If Developer A sends source files that are not being modified by Developer B, Developer B can simply use the diff program (not cvs diff or svn diff) to generate a patch file relative to his current working tree. If Developer A, however, sends changes that do affect files modified by Developer B, Developer B can either diff against his working tree to see the changes in the context of work he is performing, or Developer B can pull a new tree somewhere and generate a patch file from it. The actual method used is usually best determined by the code reviewer. The downsides of this method are as follows:
- It is error prone. (Developer A might forget to include source files that contain change.)
- It places a burden on the code reviewer to generate a patch file. The last thing you want to do on a large project is make more work for the code reviewer. Usually, a code reviewer is generally always struggling to keep up with not only his own development task, but with all the code review requests that are pouring in. Anything you can do to make his job easier will generally be appreciative (and may result in the code reviewer giving your requests a higher priority).
- Developer A generates a patch file using cvn or svn diff, and sends it to the code reviewer. This is the best method because
- The changes are relative to Developer A's source tree.
- cvs diff won't miss any changes that were made, assuming that cvs diff is run at a high-enough level of the source tree. (There is one exception: new source files that have not been added to the repository, along with forgetting to pass the -N argument to cvs diff when creating a patch file [this is not a problem with svn diff, which automatically includes new source files in its diff output.])
After the code reviewer (Developer B) receives the patch from Developer A, he or she has a few options:
- Simply look at the patch file, performing the code review based on its contents alone. Most of the time, this is what I do, especially if the patch file is a unified diff (as it should always be), and if the changes do not intersect any that I am making.
- Apply the patch file to his local tree, build the result, and then perhaps test it. This can be helpful if Developer B would like to step through an execution of the code in a debugger, or to see that the patch builds correctly and without warnings. If Developer A has made changes to some of the source files that were modified by Developer B, Developer B can either
- Pull a new tree and apply the patch to it so that his or her changes are not affected.
- Use cvs diff to generate a patch file that contains his own changes, and then attempt to apply the changes from Developer A into his source tree. This allows Developer B not only to see the changes made by Developer A, but also to see them in the context of the changes that he is making. When the code review has been completed, Developer B can continue working on his changes, and check both his and Developer B's changes in at a later time, or Developer B can have Developer A check in the changes, and then do a cvs or svn update to get in sync.
The patch program is the tool used by a code reviewer to apply changes specified in a patch file to a local copy of the repository. In essence, if both you and I have a copy of the same source tree, you can use cvs diff to generate a patch file containing changes you have made to your copy of the sources, and then I can use the patch program, along with your patch file, to apply those changes to my copy of the sources. The patch program tries very hard to do its job accurately, even if the copy of the sources the patch is being applied to have been changed in some unrelated way. The type of diff contained in the patch file affects the accuracy attained by the patch program; patch is generally more successful if it is fed a set of context diffs rather than normal diffs. The cvs diff -u3 syntax (unified diff with three lines of context) is enough to generate a patch file that gives a good result. (SVN by default generates unified diffs with three lines of context.)
Patch Options
The patch program has a number of options. (You can refer to the patch man page for more details.) However, in practice, the only option that matters much is the -p argument, which is used to align the absolute paths used in the patch file with the local directory structure containing the sources that the patch is being applied to. When you run cvs diff to create a patch file, it is best to do it from within the source tree, at the highest level in the directory hierarchy necessary to include all the files that have changes. The resulting patch file will, for each file that has changes, identify the file with a relative path, and patch uses this relative path to figure out what file in the target directory to apply changes to. For example:
Index: layout/layout.cpp =================================================================== RCS file: /usr/cvsroot/crossplatform/layout/layout.cpp,v retrieving revision 1.33 diff -u -3 -r1.33 layout.cpp --- layout/layout.cpp 27 May 2006 09:31:47 -0000 1.33 +++ layout/layout.cpp 7 Jun 2006 10:43:22 -0000 @@ -327,7 +327,7 @@ return document; } -int main(int argc, char *argv[]) +int LayoutMain(int argc, char *argv[]) { int run, parse; char *src = NULL;
The first line in the preceding patch (the line prefixed with Index:) specifies the pathname of the file to be patched. Assuming that the patch is contained in a file named patch.txt, then, if the preceding patch file were copied to the same relative location in the target tree, then issuing the following command is sufficient for patch to locate the files that are specified in the patch file:
$ patch -p0 < patch.txt
The -p argument will remove the smallest prefix containing the specified number of leading slashes from each filename in the patch file, using the result to locate the file in the local source tree. Because the patch file was copied to the same relative location of the target tree that was used to generate the patch file in the source tree, we must use -p0 because we do not want patch to remove any portion of the path names when searching for files. If -p1 were used (with the same patch file, located in the same place in the target tree), the pathname layout/layout.cpp would be reduced to layout.cpp, and as a result, patch would not be able to locate the file, because the file would not be located in the current directory. Copying the patch file down into the layout directory would fix this, but this could only be done if, and only if, the patch file affected only sources that were located in the layout directory, because the -p0 is applied by patch to all sources represented in the patch file.
Dealing with Rejects
So much for identifying which files to patch. The second difficulty you may run into is rejects. If patch is unable to perform the patch operation, it will announce this fact, and do one of two things. Either it will generate a reject file, which is a filename in the same directory as the file being patched, but with a .rej suffix (for example, bar.cpp.rej), or it will place text inside of the patched file to identify the lines that it was unable to resolve. (The -dry-run option can be used to preview the work performed by patch. As the name implies, it will cause patch to do a "dry run" of the patch operation, to let you know if it will succeed, without actually changing any of the target files.)
If either of these situations happens, there are a few ways to deal with it. The first thing I would do is remove the original source file, re-pull it from the repository using cvs update, and try to reapply the patch, in case I was applying the patch to a file that was not up-to-date with the tip. If this didn't work, I would contact the person who generated the patch and ask that person to verify that his or her source tree was up-to-date at the time the patch file was generated. If it was not, I would ask that person to run cvs update on the file or files and generate a new patch file.
If neither of these strategies works, what happens next depends on the type of output generated by patch. If patch created a .rej file, I would open it and the source file being patched in an editor, and manually copy and paste the contents of the .rej file into the source, consulting with the author of the patch file in case there are situations that are not clear. If, on the other hand, patch inlined the errors instead of generating a .rej file, open the source that was patched and search for lines containing <<<. These lines (and ones containing >>>) delimit the original and new source changes that were in conflict. By careful inspection of the patch output, and perhaps some consultation with the author of the patch, you should be able to identify which portions of the resulting output should stay, and which portions of the result need to go, and perform the appropriate editing in the file to come up with the desired final result.
Thankfully, problems like this happen only rarely. The two most common causes of conflict occur when a developer is accepting a patch that affects code that he or she has also modified, or the patch files are created against a different baseline. There is little to do to avoid the first case, other than better communication among developers to ensure that they are not modifying the same code at the same time. The second case is usually are avoided when developers are conscientious about ensuring that their source trees (and patches) are consistent with the contents of the CVS repository. When this is done, problems are relatively rare, and if they do occur, usually are slight and easy to deal with.
Patch and Cross-Platform Development
Now that you have an idea of why patch is so important to open source software, and how to use it, I need to describe how patch can make developing cross-platform software easier. At Netscape, each developer had a Mac, PC, and Linux system in his or her cubicle, or was at least encouraged to have one of each platform. (Not all did, in reality.) Each developer, like most of us, tended to specialize in one platform. (There were many Windows developers, a group of Mac developers, and a small handful of Linux developers.) As a result, it would only be natural that each developer did the majority of his or her work on the platform of his or her choice.
At Netscape, to overcome the tendency for the Windows developers to ignore the Linux and Macintosh platforms (I'm not picking on just Windows developers; Macintosh and Linux developers at Netscape were just as likely to avoid the other platforms, too), it was required that each developer ensure that all changes made to the repository correctly built and executed on each of the primary supported platforms, not just the developer's primary platform. To do this, some developers installed Network File System (NFS) clients on their Macintosh and Windows machines, and then pulled sources from the repository only on the Linux machine, to which both Mac and Windows machines had mounts. Effectively, Linux was a file server for the source, and the other platforms simply built off that source remotely. (The build system for Netscape/Mozilla allowed for this by isolating the output of builds; see Item 12.) This allowed, for example, the Windows developers to do all of their work on Windows, walk over to the Mac and Linux boxes, and do the required builds on those platforms, using the same source tree.
But what if NFS (or, Samba these days) was not available? Or, more likely, developers did not have all three platforms to build on (or if they did, have the skills needed to make use of them)? In these cases, the patch program would come to the rescue. Developers could create a patch file, for example, on their Windows machine, and then either copy it to the other machines (where a clean source tree awaited for it to be applied to), or they could mail it to a "build buddy" who would build the source for those platforms that the developer was not equipped to build for. (Macintosh build buddies were highly sought after at Netscape because most developers at Netscape did not have the desire, or the necessary skills, to set up a Macintosh development system; it was much easier to ask one of the Macintosh helpers to be a build buddy.)
Netscape's policy was a good one, and patch was an important part of its implementation. The policy was a good one because, by forcing all check-ins to build and run on all three platforms at the same time, it made sure that the feature set of each of the three platforms moved forward at the about the same pace (see Item 1). Mozilla, Netscape, and today, Firefox, pretty much work the same on Mac, Windows, and Linux, at the time of release. The combination of cvs diff, which accurately captured changes made to a local tree, and patch, which accurately merged these changes into a second copy of the tree, played a big role in enabling this sort of policy to be carried out, and allows projects such as Mozilla Firefox to continue to achieve cross-platform parity.