Skip to main content

Hematology and Hemostasis

Novel genetic loci identified for telomere length leveraging 50,000 whole genome sequences in the Trans-Omics for Precision Medicine (TOPMed) project

Authors
Margaret A. Taub, Kruthika Iyer, Joshua Weinstock, Ali R. Keramati, John Lane, Tom Blackwell, Lisa R. Yanek, Nathan Pankratz, Gonçalo Abecasis, Rasika A. Mathias, on behalf of the NHLBI TOPMed Consortium
Name and Date of Professional Meeting
ASHG (Oct 16-20)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Prior GWAS have identified 11 loci harboring common genetic determinants of telomere length (TL); several of these loci have been implicated in human disease. These studies have relied on genotype array/imputation data and TL generated through qPCR or Southern Blot. Large-scale whole genome sequencing (WGS) gives the opportunity to analyze TL data on large numbers of subjects with extensive phenotyping to (1) generate estimates of TL; (2) expand our knowledge of genetic determinants of TL; and (3) add to our understanding of the role these variants may play in disease etiology.

WGS data were generated within the NHLBI Trans-Omics for Precision Medicine (TOPMed) project, and TL was estimated on 93,219 samples using TelSeq software. We performed single variant association tests using mega-analysis, combining data from 27 TOPMed studies on a subset of 49,899 subjects within the current genotype data freeze in TOPMed (~86 million variants after filtering sites with minor allele count below 5). We used a two-stage approach, initially screening variants with a standard linear model adjusting for sex and study, followed by a linear mixed-model also adjusting for population structure and relatedness on the subset of variants with p<0.01 from the first-stage analysis (conducted on the Analysis Commons, http://analysiscommons.com/).

We discovered four novel TL loci with common variants (MAF>1%), including three with strong biological interest: (1) TERF1 (p=4.8x10-9), encoding a telomeric repeat-binding factor; (2) RFWD3 (p=2.7x10-9), encoding a protein involved in DNA damage repair; and (3) TINF2 (p=3.5x10-10), a component of the shelterin complex, which protects against telomere shortening; and (4) LINC01429 (p=3.0x10-8). There were 47 rare variants that achieved genome-wide significance for TL. We replicated 6 common loci previously associated with TL: NAF1 (p= 4.2x10-11), OBFC1 (p=6.4x10-16), TERC (p=4.9x10-20), TERT (p=1.3x10-26), ZNF208/ZNF676 (p=1.5x10-8), and RTEL1 (p=7.8x10-11).

Our TOPMed Working Group is examining methods to correct for batch effects in these TL data. In the near future, we will include an additional ~45,000 TOPMed subjects with WGS that will improve our ability to (1) fine-map the common variant loci; (2) identify novel loci through gene-based approaches; and (3) given known differences in TL by population, maximize our power to identify genetic determinants within groups.

Whole genome sequencing association analysis of red blood cell traits in a multi-ethnic population sample from the Trans-Omics for Precision Medicine (TOPMed) Project

Authors
Yao Hu, Xiuwen Zheng, Deepti Jain, Cecelia A. Laurie, Stephanie M. Gogarten, Paul L. Auer, Nathan Pankratz, Linda M. Polfus, Ming-Huei Chen, Jeffrey R. O'Connell, Joshua P. Lewis, Laura M. Raffield, Adolfo Correa, L. Adrienne Cupples, Nancy Jenny, Stephen S. Rich, Rasika A. Mathias, Lisa Yanek, John Blangero, Joanne E. Curran, Ken M. Rice, Andrew D. Johnson, Cathy C. Laurie, Alex P. Reiner, the TOPMed Hematology and Hemostasis Working Group
Name and Date of Professional Meeting
October 17, 2018
Associated paper proposal(s)
Working Group(s)
Abstract Text
Red blood cell (RBC) measurements are polygenic traits, and GWAS, exome chip and sequencing analyses have identified hundreds of associated genetic variants in European, Asian, African and Hispanic ancestry populations. However, additional loci remain undiscovered and causal variant(s) at each locus have not been well characterized. We performed WGS-based association analyses for hemoglobin (HGB), hematocrit (HCT), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV), RBC count and red cell distribution width (RDW) in multi-ethnic populations from the NHLBI TOPMed Project, consisting of 25,080 European, African, Hispanic and Asian ancestry individuals from nine studies. More than 42,400,000 SNPs (MAC≥10) were tested for association using inverse-normal transformed residuals in a multi-ethnic mega-analysis, adjusting for age, sex, study, relatedness, population structure and residual heteroscedasticity. We identified six novel loci reaching genome-wide significance (P<5E-8) that remained significant after accounting for all previously reported RBC-associated variants (Pconditional<2E-6). The six novel loci include MSRA-rs1484641 (MAF=0.35) for HCT, CREBBP-rs129965 (MAF=0.16) for MCHC, FOXO6-rs557606786 (MAF=0.001) and ANKS1A-rs13207848 for MCV (MAF=0.14), and 4q28-rs192665038 (MAF=0.003) and 18q23-rs558218738 (MAF=0.001) for RDW. All novel variants, except for ANKS1A-rs13207848, showed disparate allele frequencies across ancestral populations. We sought replication for these novel variants in UK Biobank, Kaiser and WHI-SHARe imputed GWAS datasets, but none of them was replicated (P>0.05). A total of 26 previously reported RBC trait loci were confirmed to be associated with one or more RBC traits at genome-wide significance in our TOPMed analysis, and six of them (HBA, HBB, HFE, SP1, ABCA7 and RP11-321F6.1) harbored residual secondary signals after conditional analysis (Pconditional<5E-8). Finally, in the SKAT and burden tests which aggregated rare variants (alternative allele frequency<0.01) using 5 or 50kb sliding windows, five previously reported GWAS loci were genome-wide significant (P<4E-8 for 5kb window, and P<4E-7 for 50kb window), including the well-established HBA and HBB loci, APCDD1L, MXD3 and MOXD1. In summary, our results suggest that WGS in larger sample sizes are needed for capturing variants or indels with moderate to larger effects on RBC traits that were missed by GWAS.

Novel structural variants originating in F8 non-coding regions explain previously unresolved cases of severe hemophilia A

Authors
Marsha M. Wheeler, Jill M. Johnsen, Glenn F. Pierce, Crystal Watson, NHLBI Trans-Omics for Precision Medicine, Barbara A. Konkle, Deborah A. Nickerson
Name and Date of Professional Meeting
ASHG
Associated paper proposal(s)
Working Group(s)
Abstract Text
Hemophilia A is an X-linked bleeding disorder resulting from deficiency in coagulation factor VIII. Numerous genetic variants (>2000) affecting the F8 gene have been implicated as causative of hemophilia A. These include structural variants (SVs) such as copy number variants (CNVs) and large intra-chromosomal inversions. For the vast majority of patients, causative variants can be identified using targeted sequencing of F8 coding regions and/or the use of methods which detect known SVs (e.g. inverse shifting PCR, long-range PCR, MLPA). However, these approaches fail to explain 1-3% of hemophilia A cases. In this study, we specifically performed SV analyses using whole genome sequencing (WGS) data from 11 cases of severe hemophilia A (factor VIII level < 1%) that remained genetically unexplained after exhausting available laboratory testing methods. These cases were selected from the My Life, Our Future (MLOF) hemophilia study recently sequenced by the NHLBI TOPMed program. SV analyses of the F8 genomic region revealed previously undetected deletions and inversions in 6 out the 11 cases. In these 6 samples, SV calls were supported by multiple sequencing reads (> 25 reads) and multiple types of read evidence (read depth, paired-end and/or split read evidence). Two deletions within intron 6 were detected in a single hemophilia A case, a finding which suggests F8 intron 6 may contain one or more regulatory elements critical for F8 expression. Three distinct large inversions predicted to disrupt the F8 structural gene were detected in five other cases; one case with a 720Kb inversion with breakpoints in F8 intron 6 and SPRY3 intron 1, one case with a 20Mb inversion with breakpoints in F8 intron 1 and INTS6L intron 8, and three cases with a 7.4Kb inversion with breakpoints in F8 intron 25 and the SMIM9 intron 1. These events have not been reported in hemophilia and were also not present in the larger, sequenced My Life, Our Future dataset (N=2186), supporting these SVs as novel and likely causative of severe hemophilia A. We predict additional deleterious SVs remain to be discovered in unexplained cases of hemophilia. This work further demonstrates dedicated analyses for SVs originating in non-coding regions should be considered in genetic studies of diseases caused by loss-of-function variants.
Back to top