- Structural variation
- Human genetic sequence variation
- Pathways involved in diabetes
- Tracking genes involved in coronary heart disease after GWAS
Tracking genes involved in coronary heart disease after GWAS
In 2007, Samani, et al., carried out genome wide association studies in coronary heart disease subjects. They identified several genetic loci that affect the risk of coronary artery disease (CAD), including loci at 9p21.3 and 1p13.5. In 2008, Kathiresan, et al., identified two loci associated with abnormal levels of low-density lipoprotein cholesterol (LDL cholesterol), one locus mapped to chromosome 1p13 and the other mapped to 19p13. They noted that the 1p13 locus maps near the gene SORT1 (sortilin 1).
Kjolby, et al. (2010), demonstrated that sortilin protein encoded by SORT1 is an intracellular sorting receptor for apolipoprotein ApoB100. They noted that SORT1 regulates plasma low-density lipoprotein levels through hepatic export of ApoB100 containing lipoproteins. In studies on mice, they determined that sortilin 1 over-expression stimulates the hepatic release of lipoproteins and increases plasma LDL levels.
Musunuru, et al. (2010), carried out studies in cohorts of human subjects and in human-derived hepatocytes. They determined that a noncoding polymorphism SNP rs12740374 in 1p13 impacts a transcription factor binding site that alters hepatic expression of SORT1. The risk allele G in rs12740374 disrupts the C/EBP transcription factor binding site and is significantly associated with LDL cholesterol levels, p=1X10-170. In studies on mouse livers, Musunuru, et al., determined that sortilin 1 impacts plasma levels of low-density lipoprotein (LDL) and very low-density lipoprotein (VLDL) cholesterol. They demonstrated that knockdown of sortilin 1 expression in mice led to a 46% increase in total cholesterol compared with controls.
The studies of Musunuru, et al., demonstrated the clinical relevance of non-protein-coding DNA variants identified through GWAS. They concluded that the sortilin pathway is a promising new target for therapeutic intervention in hyperlipidemia and myocardial infarction. These investigators noted that, in some individuals, aggressive treatment with statins fails to lower the levels of LDL cholesterol. Statins inhibit cholesterol synthesis through inhibiting hydroxy-3methyl-glutaryl coenzyme A reductase and reduce levels of both LDL cholesterol and total cholesterol.
In GWAS analysis of blood lipids on 100,000 individuals, Linsel-Nitschke, et al. (2010), reported evidence for the involvement of 18 genes that were previously shown to play roles in Mendelian lipid disease. The significance values for association were much higher with these variants than those obtained for other variants. Highly associated loci included LPL (lipoprotein lipase) 2X10-115, APOA1 (Apolipoprotein A1) 7X10-240, CETP (cholesterol ester transfer protein) 7X10-380, LDLR (LDL receptor) 4X10-117, APOE (apolipoproteins A) 9X10-147, APOB (Apolipoprotein B) 4X10-114, SORTL1 (sortilin1) 1X10-170, and GCKR (glucokinase regulator) 6X10-133.
Therefore, evidence indicates that genes that play roles in the etiology of rare Mendelian forms of diseases such as diabetes and hyperlipidemia also play roles in the common polygenic forms of these diseases.
Disease-specific mutation in a Hunterian museum skeleton and his living relatives
In 2011, Chahal, et al., reported that they had identified a specific mutation in the arylhydrocarbon receptor interacting protein (AIP) in four families from Northern Ireland in whom familial isolated pituitary adenoma occurred. The specific mutation in these families was a nucleotide substitution c.(910 CtoT); p.(R304X). A termination codon replaces amino acid 304, leading to a loss of 26 amino acids from the AIP protein.
Chahal, et al., obtained permission from the directors of the Hunterian museum in London to extract DNA from two teeth of the skeleton of an Irish giant who died in 1783. Harvey Cushing examined this skeleton in 1909. He concluded on the basis of the degree of enlargement of the pituitary fossa that the man had a pituitary adenoma. Chahal, et al., discovered that the same AIP mutation was present in the Hunterian giant with pituitary adenoma and in the four families from Northern Ireland they studied. Analysis of DNA polymorphisms (microsatellite repeat polymorphisms) revealed that the giant skeleton DNA and adenoma patients in the four Irish families shared a haplotype that extended for 2,068 megabases on chromosome 11q13.2 and included the AIP gene. Taking into account polymorphisms, mutation rates, and generation length, Chahal, et al., concluded that the skeleton and the four families shared a common ancestor 57 to 66 generations ago.
Discovery of familial-inherited adenomas in different populations and the role of AIP
In 2006, Vierimaa, et al., reported two clusters of families from Northern Finland who had familial pituitary adenoma that led to increased secretion of growth hormone and prolactin.
Analysis of SNP polymorphisms in these families revealed a link between adenoma development and chromosome 11q12-q13. Sequencing of genes in this region revealed a defect in the aryl hydrocarbon interacting protein AIP. Subsequent analyses in the Finnish population led to the identification of a Q14X mutation in 6 out of 45 patients with acromegaly.
Karhu and Aaltonen (2007) noted that the function of AIP was not known.
The amino-terminal region of AIP contains FKBP domains. These domains usually are involved in protein folding and trafficking. In the carboxyterminal region of AIP are three tetratricopeptide repeats. These repeats usually form scaffolds for the formation of multiprotein complexes. The carboxy-terminal region of AIP interacts with arylhydrocarbon receptor and with the HSP 90 heat shock protein. Low expression of AIP in pituitary adenomas is a marker for invasive growth hormone producing tumors.
In 2009, Jennings, et al., reported studies on Polynesian kindred with three members who presented with pituitary macro-adenoma in childhood or adolescence. These patients had AIP mutation R271W. They presented with headaches, visual disturbances, and excessive height. Features of acromegaly were absent. Acromegaly features include frontal bossing and overgrowth of hands and feet.
In 2010, Daly, et al., reported results of an international study on 96 patients with germline AIP mutations and pituitary adenomas. They noted that the patients were usually young and that the first symptoms occurred in children or adolescents. Males constituted 63.6% of the patients. The majority of the tumors were macro-adenomas. Excessive secretion of growth hormone occurred in 78% of tumors. In 13 of the 96 cases, prolactin secretion was excessive; 7 tumors were nonsecreting.
In 2010, Chahal, et al., reported that they had identified 49 different AIP mutations in patients with familial-inherited pituitary adenomas. These included deletions, insertion, segmental duplications, nonsense, missense, and splice site mutations. In addition, whole exon deletion or deletion of the entire AIP gene occurred in some patients. They noted that in the cohort of families they studied, approximately 30% of the individuals who carried a germline AIP mutation presented with pituitary tumors.
Chahal, et al. (2010), concluded that the physiological role of the arylhydrocarbon receptor (ARH) likely includes cell proliferation and differentiation. ARH occurs in the cytoplasm as a multiprotein complex with AIP, HSP90, and co-chaperone p23. This complex binds to xenobiotics. It is transferred to the nucleus, where its binds with hypoxia inducible factor HIF1b, also known as ARNT. They noted that several proteins involved in the regulation of hypoxia-induced proteins play a role in tumor susceptibility. These include succinate dehydrogenase fumarate hydratase and Von Hippel Lindau proteins. (These proteins are discussed further in Chapter 5, "Pathways, Phenotypes, and Phenocopies.")
Evidence also indicates that AHR binds to ubiquitin ligase and plays a role in the degradation of estrogen and androgen receptors.
Next-generation sequencing
Key elements in next-generation sequencing are the miniaturization of sequencing reactions, the sequencing of short fragments of DNA bound to solid matrices, and real-time photo-capture of the sequencing reactions. Different companies have developed a number of different platforms for next-generation sequencing; precise methods for capturing fragments and sequencing vary depending on the sequence platform used. In some cases, fragments are ligated with specific oligonucleotides at each end, and these are hybridized to matching oligonucleotides on the solid matrix sequencing platform. In other cases, DNA fragments are biotin labeled and then captured with streptavidin beads; the beads are subsequently captured on the sequencing platform. Detecting the sequencing reaction is enabled through use of nucleotides A G C T, each labeled with different colors of fluorescent dyes, and fluorescent images are captured. The flow cells used as sequencing platforms are partitioned into several channels so that a number of samples can be simultaneously analyzed.
Next-generation sequencing is referred to as massively parallel sequencing because thousands of short sequences are read at the same time and each sequence is read optimally about 30 times. Sequence data generated on the platform is submitted to a computer and is subsequently aligned to reference sequence.
Whole-genome sequencing in humans generates a very large amount of data for analysis. In determining disease-causing mutations in humans, capturing specific genomic regions and capturing the human exome represent methods that reduce the complexity of the data analysis.
Data analysis may be further simplified by prioritizing genomic regions or genes to be studied through filtering at the levels of bioinformatic analysis. Roach, et al. (2010), carried out studies on two siblings affected with an autosomal recessive disorder called Miller syndrome and their parents. They applied previous information on polymorphic markers and haplotype analysis to select areas of the genome for computational analysis following whole-genome sequencing. They focused their analysis on 22% of the genome where both affected offspring inherited the same genomic segments from both parents.
In reviewing the application of next-generation sequencing to the discovery of rare gene defects that cause Mendelian diseases, Ng, et al. (2010b), emphasized that linkage information may narrow the genomic region that needs to be sequenced or computationally analyzed.
Additional studies are usually required to definitively establish the significance of sequence alterations that are likely candidates for disease causation. Significant changes include chain termination substitution, deletions, and nonsynonymous nucleotide substitution that cause amino acid substitutions that likely alter the structure or localization of a gene product. Follow-up studies include applying PCR (polymerase chain reaction amplification) and Sanger sequencing. Downstream follow-up includes biochemical and physiological studies.
Whole-genome sequencing and the discovery of mutation leading to Charcot-Marie-Tooth Neuropathy
Charcot-Marie-Tooth neuropathies (CMT) are a group of disorders characterized by peripheral motor and sensory neuropathies with different modes of inheritance, including autosomal dominant, autosomal recessive, and X-linked inheritance. They are characterized clinically by symmetric distal polyneuropathy. Progressive muscle weakness and atrophy occur particularly in the peroneal muscles, leading to foot-drop and abnormal gait.
CMT results from mutations in at least 39 different genes. Lupski, et al. (2010), reported that mutation testing is available in the United States for 15 of the 39 genes and costs $15,000.
Lupski, et al., reported results of whole-genome sequencing and follow-up analysis on a family with CMT. Sequencing yielded 89 gigabytes of sequence data; the depth of coverage was 30, indicating that each base was sequenced 30 times. The sequence derived from the affected proband was compared to the human reference genome sequence, and differences between the two were documented. These differences included single base substitutions, small deletions, and insertions and copy number changes.
Copy number variants were examined by array comparative hybridization and by sequence analysis. No copy number variants were identified that impacted genes known to play roles in CMT.
Lupski, et al., focused attention on the 9,069 single nucleotide substitutions that led to nonsynonymous codon changes. Of these 121 were nonsense mutations. Data was examined to search for single nucleotide substitutions in the proband that impacted genes known to cause neuropathic conditions. Two nucleotide substitutions were found in the SH3TC2 gene, one missense mutation that led to Y169H and one nonsense mutation R954X.
Lupski, et al., noted that mutations in the SH3TC2 gene were previously described in Eastern European, Turkish, and Spanish gypsy patients and that the R954X mutation was present in some of these patients.
In the family reported by Lupski, et al., the R954X mutation occurred in one parent of the proband and the Y169H mutations occurred in the other parent. The proband had three siblings affected with CMT, and all three carried both the R954X and Y169H mutations. Subclinical phenotypes revealed by neurophysiological studies occurred in heterozygotes for each of the mutations.
Earlier studies on SH3TC2 and Charcot-Marie-Tooth neuropathy
Demyelinating autosomal recessive CMT was mapped to chromosome 5q23-q33 through homozygosity mapping in consanguineous families. Subsequently, sequence analysis of genes in this region revealed mutations in the SH3TC2 gene (Azzedine, et al., 2006). In ten consanguineous families, eight different mutations were found. Six of the mutations occurred in exon 11. Two cases had R954X mutations. Azzedine, et al., noted that the patients had foot deformities and that spinal abnormalities (kyphoscoliosis) also occurred.
In an analysis of 23 English patients with autosomal recessive CMT, Houlden, et al. (2009), identified 5 patients with SH3TC2 mutations. Affected members in four families were homozygous for the R954X mutation, and in one family, the affected members were compound heterozygotes for the R954X mutation and E657K mutation. Houlden, et al., noted clinical heterogeneity in the families with respect to the severity of neuropathy. Neuropathology on sural nerve biopsies revealed demyelinating fibers and an abnormal Schwann cell that formed onion bulb–like structures.
The SH3TC2 protein localizes to the cellular plasma membrane and to the membrane of vesicles in the endocytic membrane trafficking pathway (Lupo, et al., 2009).
Disruption in this pathway apparently leads to impaired interactions between Schwann cells and axons.
Roberts, et al. (2010), demonstrated interaction between SH3TC2 protein and the membrane small GTPase Rab 11. Rab11 is known to regulate the recycling of internalized membranes in the endosomal pathway.
Exome sequencing
Analysis is simplified when exome sequencing rather than whole-genome sequencing is carried out, because the exome constitutes approximately 1% of the genome, approximately 30 megabases (Mb). Ng, et al. (2010a), carried out exome sequencing on four unrelated individuals affected with Miller syndrome. Clinical features in Miller syndrome include micrognathia, cleft lip and palate, and eye and limb anomalies. To derive the sequence, Ng, et al., used array-based capture of exomes. Their study was initiated using DNA from affected siblings, which facilitated a search for changes in regions where siblings had identical polymorphisms and nucleotide substitutions. They identified a mutation in the dihydro-orotate dehydrogenase gene DHODH. They subsequently carried out studies on individuals affected by Miller syndrome in three unrelated families. Sequence analysis established that these affected individuals were compound heterozygotes for DHODH mutations. The DHODH gene product plays a role in pyrimidine metabolism.
Next-generation sequencing continues to shed light on DNA sequence changes and their potential roles in diseases. Bioinformatic analysis of sequence data is challenging, and continued development of resources for analysis is important. Equally important will be downstream analysis of the biochemical and physiological effects of sequence changes.