Git Essentials
- Version Control Concepts
- Installing Git
- Git Concepts and Features
- Summary
This chapter introduces you to Git, including how to install the necessary software to access Git servers where your software project will be stored.
Version Control Concepts
To understand Git and the concept of version control, looking at version control from an historical perspective is helpful. There have been three generations of version control software.
The First Generation
The first generation was very simple. Developers worked on the same physical system and “checked out” one file at a time.
This generation of version control software made use of a technique called file locking. When a developer checked out a file, it was locked and no other developer could edit the file. Figure 12.1 illustrates the concept of this type of version control.
Figure 12.1 First-generation version control software
Examples of first-generation version control software include Revision Control System (RCS) and Source Code Control System (SCCS).
The Second Generation
The problems with the first generation included the following:
Only one developer can work on a file at a time. This results in a bottleneck in the development process.
Developers have to log in directly to the system that contains the version control software.
These problems were solved in the second generation of version control software. In the second generation, files are stored on a centralized server in a repository. Developers can check out separate copies of a file. When the developer completes work on a file, the file is checked in to the repository. Figure 12.2 illustrates the concept of this type of version control.
Figure 12.2 Second-generation version control software
If two developers check out the same version of a file, then the potential for issues exists. This is handled by a process called a merge.
Examples of second-generation version control software include Concurrent Versions System (CVS) and Subversion.
The Third Generation
The third generation is referred to as Distributed Version Control Systems (DVCSs). As with the second generation, a central repository server contains all the files for the project. However, developers don’t check out individual files from the repository. Instead, the entire project is checked out, allowing the developer to work on the complete set of files rather than just individual files. Figure 12.3 illustrates the concept of this type of version control.
Figure 12.3 Third-generation version control software
Another (very big) difference between the second and third generation of version control software has to do with how the merge and commit process works. As previously mentioned, the steps in the second generation are to perform a merge and then commit the new version to the repository.
With third-generation version control software, files are checked in and then they are merged. To understand the difference between these two techniques, first look at Figure 12.4.
Figure 12.4 Second-generation merge and commit
In phase 1 of Figure 12.4, two developers check out a file that is based on the third version. In phase 2, one developer checks that file in, resulting in a version 4 of the file.
In phase 3 the second developer must first merge the changes from his checked-out copy with the changes of version 4 (and, potentially, other versions). After the merge is complete, the new version can be committed to the repository as version 5.
If you focus on what is in the repository (the center part of each phase), you will see that there is a very straight line of development (ver1, ver2, ver3, ver4, ver5, and so on). This simple approach to software development poses some potential problems:
Requiring a developer to merge before committing often results in developers’ not wanting to commit their changes on a regular basis. The merge process can be a pain and developers might decide to just wait until later and do one merge rather than a bunch of regular merges. This has a negative impact on software development as suddenly huge chunks of code are added to a file. Additionally, you want to encourage developers to commit changes to the repository, just like you want to encourage someone who is writing a document to save on a regular basis.
Very important: Version 5 in this example is not necessarily the work that the developer originally completed. During the merging process, the developer might discard some of his work to complete the merge process. This isn’t ideal because it results in the loss of potentially good code.
A better, although arguably more complex, technique can be employed. It is called Directed Acyclic Graph (DAG), and you can see an example of how it works in Figure 12.5.
Figure 12.5 Third-generation commit and merge
Phases 1 and 2 are the same as shown in Figure 12.4. However, note that in phase 3 the second “check in” process results in a version 5 file that is not based on version 4, but rather independent of version 4. In phase 4 of the process, versions 4 and 5 of the file have been merged to create a version 6.
Although this process is more complex (and, potentially, much more complex if you have a large number of developers), it does provide some advantages over a “single line” of development:
Developers can commit their changes on a regular basis and not have to worry about merging until a later time.
The merging process could be delegated to a specific developer who has a better idea of the entire project or code than the other developers have.
At any time, the project manager can go back and see exactly what work each individual developer created.
Certainly an argument exists for both methods. However, keep in mind that this book focuses on Git, which uses the Directed Acyclic Graph method of third-generation version control systems.