The Human Genome Project
Let’s turn now to the issue of how geneticists study the origins of disease, beginning with something called the Human Genome Project. This is an effort to identify and describe the function of every one of the genes in the human genome, particularly those related to disease. Early on, there were some naïve expectations that just by sequencing a genome, the genes would be obvious and within a few years we would have cures for all the major maladies that afflict citizens of the developed world.
It hasn’t turned out that way, for good reasons, but the technical accomplishments have exceeded expectations, and it is doubtful that anyone foresaw the direction that genome science would take. The first announcement of a draft human genome sequence was greeted by President Bill Clinton as a step toward a closer understanding of God’s design. Less spiritual observers saw it as a step toward diagnostics and interventions for hundreds of diseases. Cynics saw it as yet another example of scientists’ hubris in throwing hundreds of millions of dollars at a problem without solving anything. My sense is that, like man’s walking on the moon, it is an achievement that serves as an identifiable landmark in the emergence of a new domain of human endeavor, but will eventually be seen as just another small step along the human journey of self-perception.
There were actually two genomes sequenced—one by an international consortium that was financed by public money, and the other by a commercial enterprise known as Celera Genomics. It is legitimate to ask why hundreds of millions of taxpayers’ dollars were spent on a project that turned out to be doable by private initiative. There are many answers to this question. One is that Celera might never have started without the incentive provided by the public effort (and similarly, the public effort would not have finished so quickly without being pushed by Celera). Another is that there were legitimate reasons to believe that the strategy adopted by Celera would not work, whereas the public approach was guaranteed both to work and to provide useful information as it progressed.
The two projects took what we might refer to as MapQuest and Google Earth strategies toward sequencing the human genome. Suppose that you are asked to come up with a brand new atlas of the United States, complete with street names and house numbers. Most of us would probably start by employing someone in each state and charging them with the task of mapping out the major cities and highways. Lake Tahoe and Fresno would be placed on the atlas as the cartographers radiated out from Los Angeles and San Francisco, eventually linking up with Reno and Las Vegas. The approach would be painstaking and slow, but for most intents and purposes, guaranteed to be accurate, and the drafts could be used even before the final version was available. This is what the public effort did: Each chromosome was assigned to a major sequencing center, and the consortium put the pieces together over a period of five years.
By contrast, the maverick visionary behind Celera, J. Craig Venter, decided to do the equivalent of renting a satellite to take hundreds of millions of photographs. Every piece of land would be present on at least ten of the photographs, and a massive supercomputer was programmed to find the bits of similarity at the edges, assembling the complete atlas simultaneously based on overlaps between adjacent photographs. The process was fast and relatively cheap, but you might imagine that all those Main Streets and repetitive cornfields in the Midwest would confuse the alignment of photographs, making for some odd distance estimates, and that bits of New York might turn up in the middle of Philadelphia by accident. However, the Celera people were clever enough to devise ways around these problems, and their atlas of the human genome turned out to be just fine. It also turned out to be Craig Venter’s own genome!
Bear in mind that an atlas is just a set of guides: It doesn’t tell you where steel is manufactured, where cotton is grown, or who lives at 286 Magnolia Lane. For that we need classical genetics, bioinformatics, and molecular biology. The biggest emphasis right now, though, is on comprehending the variation at each of the positions in the atlas. This is the quest to find the tens of millions of places in the genome where we all differ, and to work out which few thousand of these differences are associated with disease and behavioral, physiological, and physical variation.
We don’t need to get into the gory details of how the genetic code can possibly hold the secret to life; suffice it to say that it consists of four letters, A, T, G, and C, strung together in long molecules of DNA. Each human gene is made of something like 10,000 of these letters, and there are around 1,000 genes per chromosome. The sequence of these letters specifies the nature and function of each gene; sequences can vary among individuals in three basic ways. Single nucleotide polymorphisms (SNPs) are positions in the genome where two or more different letters might be found if you compare two people. Indels are insertions and deletions, usually of just one or a few letters. Thus, comparing the sequence AATGCGCA with AGTGCGCCA, it appears that there is an A/G SNP at the second position, and an insertion of an extra C two bases from the end of the second sequence. The third class is copy number variation (CNV), which is much larger insertions and deletions of thousands of bases at a time.
Ultimately, change in the sequence of letters translates into susceptibility to disease or blue eyes or a cheery disposition. Remarkable as it may seem, a person who has an A instead of a G at position 102,221,163 on chromosome 11, may be born with a mild heart defect. This insight leads to the idea that if we could sequence a person’s genome, we could maybe work out what diseases that person may be likely to get. Venter has written a book that does just this for himself, but it must be stated that our ability to interpret sequence differences is primitive, and like a Shakespearian play or a T. S. Eliot poem, the genetic words will always be subject to interpretation.
So what has the sequencing of the human genome achieved, and when should we expect to see some impact on medical care? The achievement is that we now have the solid foundation upon which twenty-first century biomedicine will be erected. To see this, think about the analogy of the genome sequence as a road atlas once more. Prior to its completion, molecular biologists were simply working with the obvious features in the genomic landscape, or laboring painstakingly to find where they were headed. Now they know exactly where the residential areas are, where to find businesses or manufacturing sectors, and what type of agriculture is carried out where. They have the street names and addresses for most families and can readily find the government regulators.
To identify what is going wrong when a genetic disease occurs, though, they need to be able to peer inside the houses and offices. Sometimes the cause of a problem is obvious, as if the roof were missing. More generally, though, subtle problems get in the way. The most interesting cases don’t involve a single mutation, but rather the accumulated effects of many regular, everyday variants in the genome. These variants are behind diabetes, asthma, and depression, and they take advanced statistical and computational procedures to find.