Skip to main content

Structural Variation

Novel genetic loci identified for telomere length leveraging 50,000 whole genome sequences in the Trans-Omics for Precision Medicine (TOPMed) project

Authors
Margaret A. Taub, Kruthika Iyer, Joshua Weinstock, Ali R. Keramati, John Lane, Tom Blackwell, Lisa R. Yanek, Nathan Pankratz, Gonçalo Abecasis, Rasika A. Mathias, on behalf of the NHLBI TOPMed Consortium
Name and Date of Professional Meeting
ASHG (Oct 16-20)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Prior GWAS have identified 11 loci harboring common genetic determinants of telomere length (TL); several of these loci have been implicated in human disease. These studies have relied on genotype array/imputation data and TL generated through qPCR or Southern Blot. Large-scale whole genome sequencing (WGS) gives the opportunity to analyze TL data on large numbers of subjects with extensive phenotyping to (1) generate estimates of TL; (2) expand our knowledge of genetic determinants of TL; and (3) add to our understanding of the role these variants may play in disease etiology.

WGS data were generated within the NHLBI Trans-Omics for Precision Medicine (TOPMed) project, and TL was estimated on 93,219 samples using TelSeq software. We performed single variant association tests using mega-analysis, combining data from 27 TOPMed studies on a subset of 49,899 subjects within the current genotype data freeze in TOPMed (~86 million variants after filtering sites with minor allele count below 5). We used a two-stage approach, initially screening variants with a standard linear model adjusting for sex and study, followed by a linear mixed-model also adjusting for population structure and relatedness on the subset of variants with p<0.01 from the first-stage analysis (conducted on the Analysis Commons, http://analysiscommons.com/).

We discovered four novel TL loci with common variants (MAF>1%), including three with strong biological interest: (1) TERF1 (p=4.8x10-9), encoding a telomeric repeat-binding factor; (2) RFWD3 (p=2.7x10-9), encoding a protein involved in DNA damage repair; and (3) TINF2 (p=3.5x10-10), a component of the shelterin complex, which protects against telomere shortening; and (4) LINC01429 (p=3.0x10-8). There were 47 rare variants that achieved genome-wide significance for TL. We replicated 6 common loci previously associated with TL: NAF1 (p= 4.2x10-11), OBFC1 (p=6.4x10-16), TERC (p=4.9x10-20), TERT (p=1.3x10-26), ZNF208/ZNF676 (p=1.5x10-8), and RTEL1 (p=7.8x10-11).

Our TOPMed Working Group is examining methods to correct for batch effects in these TL data. In the near future, we will include an additional ~45,000 TOPMed subjects with WGS that will improve our ability to (1) fine-map the common variant loci; (2) identify novel loci through gene-based approaches; and (3) given known differences in TL by population, maximize our power to identify genetic determinants within groups.

Large-scale characterization of CYP2D6 variation in African Americans using TOPMed whole genome sequencing data

Authors
Seung-been Lee1, Marsha M. Wheeler1, James G. Wilson2, Adolfo Correa3, Pramod Anugu3, Deborah A. Nickerson1,4, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium
Name and Date of Professional Meeting
ASHG 2018 (TBD)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Genotyping CYP2D6 is important for precision drug therapy because the enzyme it encodes metabolizes about 25% of drugs, and its activity varies substantially within and between populations. Over 100 haplotypes or star alleles have been defined for CYP2D6. These are characterized by single nucleotide variants, insertion-deletion variants, and structural variants that include gene deletions, duplications, and hybridizations with a nearby non-functional but highly homologous paralog CYP2D7. Although many star alleles are known to exhibit distinct population frequencies, further investigation of CYP2D6 variation is warranted in non-European populations because these are underrepresented in the overall estimation of CYP2D6 genetic diversity. In this study, our goal was to comprehensively survey CYP2D6 variation in a large African American cohort. Specifically, we performed CYP2D6 genotype analysis of 3,418 African American samples from the Jackson Heart Study that were whole genome sequenced by the Trans-Omics for Precision Medicine (TOPMed) program. These samples were genotyped using Stargazer, a bioinformatics tool we recently developed for accurate calling of star alleles in various polymorphic pharmacogenes, including CYP2D6. From our samples, we found a total of 41 unique haplotypes including the decreased function CYP2D6*17 (16.6%) and *29 (7.8%) alleles that are commonly found in individuals of African ancestry and the non-functional CYP2D6*4 allele (3.1%). We also observed extensive structural variation ranging from zero to five gene copies and numerous CYP2D6/CYP2D7 hybrids. That is, 22 out of 41 detected haplotypes had structural variation: 5.4% with a gene deletion (CYP2D6*5), 7.0% with a gene duplication (CYP2D6*1x2, *2x2, *4x2, *10x2, *17x2, *29x2, *34x2, *41x2, *43x2, *4N+*4, *36+*10, *68+*4, *77+*2, and *78+*2), 0.1% with a gene multiplication (CYP2D6*1x3, *2x3, *4x3, *29x3, and *34x3), and 1.3% with a gene hybridization (CYP2D6*4N, *36, *66, *68, *76, *77, and *78). Overall, 25.5% of the samples had at least one type of structural variation and based on diplotype calls, 1.6%, 8.9%, 84.4%, and 3.9% were predicted to be poor, intermediate, normal, and ultrarapid metabolizers, respectively. We are currently expanding our analysis to include other cohorts in the TOPMed program. These results demonstrate the importance of enhancing our understanding of CYP2D6 genetic diversity in large and diverse datasets in order to achieve precision drug therapy.
Back to top