Skip to main content

Hematology and Hemostasis

Whole genome sequencing and associations with coagulation factors VII and VIII and von Willebrand factor: the TOPMed program

Authors
Paul S. de Vries, Michael R. Brown, Jennifer E. Huffman, Laura M. Raffield, Benjamin Rodriguez, Jennifer A. Brody, Jeffrey Haessler, Lisa R. Yanek, Joshua P. Lewis, Laura Almas, Nathan Pankratz, Xiuqing Guo, Alexander P. Reiner, Andrew D. Johnson, Nicholas L. Smith, and Alanna C. Morrison on behalf of the TOPMed Hematology and Hemostasis Working Group
Name and Date of Professional Meeting
ASHG (October 15-19, 2019)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Coagulation factor VII (FVII) and factor VIII (FVIII), and its carrier protein von Willebrand factor (vWF) are implicated in in modulating the risk of arterial and venous thrombosis. This study brings together extensive whole genome sequence (WGS) resources, hemostasis phenotypes, and capitalizes on advances in computational analysis in order to better understand the genetic architecture of hemostatic factors.
We leveraged Freeze 6 WGS from the NIH NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Plasma levels of FVII (n= 16,335), FVIII (n= 19,766), and vWF (n= 14,020) were harmonized across 9 studies that included participants of European, African, Asian, and Hispanic ancestry. Association analyses were conducted across all individuals using inverse normalized and rescaled residuals adjusting for age, sex, ancestry, principal components, and a kinship matrix. Analyses were conducted on the Analysis Commons cloud computing platform using the SMMAT function implemented in GENESIS. Single-variant analyses assessed all variants with a minor allele count ≥40. Aggregate analyses grouped low-frequency and rare variants (MAF<0.05) by gene, using 3 strategies for selection of variants within gene-based aggregation units: 1) loss of function (LOF) variants, 2) LOF and deleterious missense variants, and 3) coding, enhancer and promoter variants.
Single-variant analyses identified significant associations (P<5E-8) at 4 known loci for FVII, 8 for FVIII, and 8 for vWF. Three new associations included rs538727675 located between FNDC3B and GHSR associated with FVIII (MAF=0.0015; P=2.2E-8), rs114894279 downstream of TBL1XR1 associated with FVIII (MAF=0.012; P=3.4E-8), and rs147142418 in DPF3 associated with vWF (MAF=0.017; P=1.1E-8). Conditional analyses revealed multiple independent signals at the F7, VWF, STAB2, and ABO loci. Gene-based aggregate analyses identified associations at 1 known locus for FVII (F7), and 3 known loci each for vWF and FVIII (VWF, STAB2, and ABO). Additionally, LOF variants in CD36 were associated with FVIII, representing a novel locus for FVIII. The driving variant, rs3211938, causes CD36 deficiency and is associated with a range of hematological phenotypes.
WGS analysis of hemostatic factors yielded novel genetic associations with FVIII and vWF. Replication of these findings will be completed in up to 30,000 individuals from studies with imputed genotypes based on TOPMed as a reference panel.

Multi-ethnic whole genome sequence analysis of fibrinogen, fibrin D-dimer, tissue plasminogen activator & plasminogen activator inhibitor 1 within the TOPMed program

Authors
Jennifer E. Huffman1,2, Benjamin Rodriguez2,3, Laura M. Raffield4, Paul S. de Vries5, Jennifer A. Brody6, Han Chen5, Michael R. Brown5, Jeffrey Haessler7, Joshua P. Lewis8, Nathan Pankratz9, Lisa Yanek10, Xiuqing Guo11, Russell Bowler12, Laura Almasy13,14, Lawrence F. Bielak15, Alexander P. Reiner7,16, Andrew D. Johnson2,3, Alanna C. Morrison5, and Nicholas L. Smith16-18 on behalf of the TOPMed Hematology and Hemostasis Working Group
Name and Date of Professional Meeting
ASHG 2019 Annual Meeting (Oct 15-19, 2019)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Compared with array-based, imputation-based, and exome-focused analyses, whole genome sequencing (WGS) data provides better coverage of the genome and better representation of non-European variants.

To better understand the genetics underlying several hemostasis traits, we leverage Freeze 6 deep whole genome sequences from NHLBI’s Trans-Omics for Precision Medicine (TOPMed) program to investigate plasma levels of 4 hemostasis measures: fibrinogen (n= 32,572), fibrin D-dimer (n=19,049), tissue plasminogen activator (tPA; n=4,393), and plasminogen activator inhibitor 1 (PAI-1; n=7,857). Phenotypes were centrally harmonized across up to 12 studies that included participants of European, African, Asian, and Hispanic ancestry. Association analyses were conducted using inverse normalized and rescaled residuals adjusting for age, sex, study, TOPMed phase, study-specific parameters, self-reported ancestry, 11 ancestry informative principal components, and a kinship matrix. All analyses were conducted on the Analysis Commons cloud computing platform using the SMMAT function implemented in GENESIS. Single-variant analyses included all variants with a minor allele count ≥40. Gene-based aggregate analyses used 3 strategies for variant selection: 1) loss of function (LOF), 2) LOF and deleterious missense (LDM), and 3) coding, enhancer and promoter variants. The latter aggregation tests were restricted to variants with a minor allele frequency (MAF)<0.05, whereas no restrictions were implemented for LOF aggregation tests.

Significantly associated regions were found in single variant tests for fibrinogen (n=7) and D-dimer (n=3). All were in loci previously associated with these phenotypes, and the majority were common variants in high linkage disequilibrium with previously reported variants. The most significant association for fibrinogen was a previously reported rare missense mutation (rs148685782, p=6.8x10-48, MAF=0.003, FGG) located within the region containing the fibrinogen structural genes: FGA, FGB, FGG. LOF and LDM aggregation tests demonstrated associations with these genes only. No significant genes with >5 alternate alleles were identified for D-dimer. No associations were detected for tPA or PAI-1.

Fine-mapping analyses are planned within several regions for fibrinogen to leverage the resolution provided by WGS. Meta-analysis using external cohorts imputed to the TOPMed reference panel is planned to improve power for discovery.

Thirteen novel genetic loci identified for telomere length leveraging 75K whole genome sequences in the Trans-Omics for Precision Medicine (TOPMed) Program

Authors
Margaret A. Taub1, Joshua Weinstock2, Kruthika Iyer3, Lisa R. Yanek4, Matthew P. Conomos5, Marios Arvanitis6,7, Ali R. Keramati 4, John Lane8, Tom Blackwell2, Cecelia Laurie5, Timothy Thornton5, Alexis Battle7, James A. Perry9, Nathan Pankratz8, Alexander Reiner10, Rasika A. Mathias4, on behalf of the NHLBI TOPMed Consortium
Name and Date of Professional Meeting
American Society of Human Genetics, Oct 15-19, 2019
Associated paper proposal(s)
Working Group(s)
Abstract Text
Telomere length (TL) is considered a molecular/cellular hallmark of aging. Fifteen recent genome-wide association studies (GWAS) have found 16 TL loci. These prior GWAS have two limitations: (i) almost all have been in European ancestry individuals; and (ii) all have relied on array genotype data. Therefore, very little is known about the specific causal variants, and even less about the genetic architecture of these loci in individuals with other ancestral backgrounds.

We leverage TOPMed whole-genome sequencing (WGS) data to estimate TL bioinformatically using TelSeq software in the largest multi-ethnic dataset for TL GWAS to date. Genomewide tests for association in a meta-analysis of n=46,458 discovery and n=28,718 replication samples were performed using GENESIS on 82M variants with minor allele count >= 5, adjusting for age, sex, study, sequencing center, population structure and relatedness. We identified 22 loci (p <5x10-8), including 9 prior and 13 novel loci. Several of the novel loci map to genes that play a role in telomere biology: RFWD3, TERF1, TINF2, POT1, ATM, SAMHD1, and TERF2. Of the top 25 pathways identified in gene set enrichment analysis for these loci (FDR< 5.6x10-5), 24 are related to telomere length/maintenance, DNA regulation, telomere capping/loop disassembly, and telomere organization.

We estimate TL heritability to be 47%, consistent with previous reports. Stratified analysis was performed by race/ethnicity: African (n=21179), Asian (n=4754), Hispanic/Latino (n=9808), European (n=38193), and Samoan (n=1242), and several loci show population differences. In particular, TINF2 has a strong association in the Samoan (alternate allele frequency (AAF)=0.23; p=1.3x10-7), Asian (AAF=0.09; p=1.3x10-5) and African (AAF=0.01; p=2.6x10-4) groups, and no association in the European group (AAF <0.005). PheWAS of sentinel variants at TERT and TERC had associations with myeloproliferative neoplasms, cancers of skin/brain, and leiomyoma/benign neoplasms of the uterus (all p<10-8) in the UK Biobank. Sentinel variants at NAF1, TERF1, ZNF729, POT1, and CHKB-AS1 had suggestive associations with uterine fibroids (p=0.008 to 0.07).

We showcase the promise of leveraging WGS in TOPMed for TL genetics in the context of race and ancestry. Future efforts include fine-mapping and co-localization analysis using GTEx and eQTLGen whole blood eQTLs to identify functional variants, with an emphasis on loci showing population differences in signal.

Sparse Empirical Kinship Matrices Enable Computationally Efficient and Accurate Association Tests in Large Samples

Authors
Matthew P Conomos
Tianyu Zhang
Stephanie M Gogarten
Deepti Jain
Caitlin P McHugh
Yao Hu
Alexander P Reiner
Kenneth M Rice
Name and Date of Professional Meeting
ASHG October 15-19, 2019
Associated paper proposal(s)
Working Group(s)
Abstract Text
Mixed models for genetic association testing have traditionally accounted for structure among samples by using an empirical genetic relationship matrix (GRM) that measures genetic covariance, genome-wide, from both ancestry and relatedness. However, fitting mixed models in samples with tens or hundreds of thousands of individuals can be a prohibitive computational burden. Here, we address this problem by using a sparse empirical kinship matrix (KM) and ancestry principal components in place of a GRM.

Standard forms of empirical GRMs and KMs estimated from genotype data are dense; i.e. have no entries equal to zero. To exploit the computational speedups that sparse matrices enable, we make an empirical KM sparse by clustering samples based on their pairwise kinship estimates, setting all inter-cluster estimates to zero; this can also be thought of as approximating low levels of relatedness as `unrelated’. In today’s large-scale population studies, where those in pedigrees are a small proportion of the overall sample, this approximation can be expected to be highly accurate, and the computational speedup substantial.

To illustrate the computational advantage and statistical impact of using sparse empirical KMs, we performed genetic association analyses using seven red blood cell traits and WGS data from TOPMed freeze 6. Between 17,469 and 48,858 samples were available for these traits. Using a 4th degree relatedness threshold (i.e. kinship > 0.022) and our proposed algorithm, 98.3% to 99.5% of entries in the sparse KM were set to zero, and the largest cluster ranged from 1667 to 2459 samples. Compared to using a GRM, using a sparse KM significantly improved computational performance; e.g. fitting the null models for these traits took just 0.5-6.2% of the CPU time and required 1.4-6.7% of the memory. Furthermore, differences in association p-values between the two approaches were small. For these traits, over 99.99% of tests differed in -log10(p) by less than 0.5; i.e. by an amount very unlikely to change the practical interpretation of results. With the level of sparsity attainable in population studies such as TOPMed, we also find that our approach performs favorably compared to SAIGE, another mixed model method designed for analysis of large samples. The use of sparse KMs is a promising and flexible approach to improve the computational efficiency of association testing in large population studies, without sacrificing accuracy.

Calling and Imputation of the Common a-globin Copy Number Variant with Whole Genome Sequencing Data in TOPMed and Association with Hematologic and Other Clinical Phenotypes

Authors
Minzhi Jiang1, Ye Su2, Thomas W. Blackwell3, Goncalo Abeçasis3, Alexander P. Reiner4, Yun Li2,5,6, Laura M. Raffield2
Name and Date of Professional Meeting
American Society of Human Genetics, October 15-19, 2019
Associated paper proposal(s)
Working Group(s)
Abstract Text
Recent work has shown that inheriting a single copy of the sickle cell causing β-globin variant rs334 (i.e. sickle cell trait) can be associated not only with alterations of blood cell indices and hemoglobin A1c levels, but also with increased risk of certain medical conditions, such as chronic kidney disease. In individuals with sickle cell disease, co-inheritance of another globin gene variant, the 3.7 kb α globin gene deletion (a common cause of α-thalassemia in African populations), can modify the risk of stroke and other sickle cell disease complications. In this work, we first aim to better catalog this CNV in population-based cohorts via whole genome sequencing data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, assessing the performance of different structural variant calling methods. We next evaluate the ability to impute this CNV using SNP genotypes from commercial arrays. Lastly, we conduct association analysis for main effects of the α globin CNV and its interactions with other globin gene variants on a battery of hematological traits and other clinical phenotypes including stroke, chronic kidney disease, and hemoglobin A1c levels. Preliminary results from 2,916 African Americans in the Jackson Heart Study demonstrated highly reliable calling using either GenomeSTRiP or LUMPY (correlation >0.99). In addition, we can achieve reasonable imputation quality for structural variant calls (r2 = 0.629 using minimac3) for imputation from Affymetrix 6.0 SNP genotypes in JHS. Finally, association analysis confirmed the CNV’s significant main effects as well as interaction effects with other β globin variants on multiple traits. For example, the CNV is associated with higher hemoglobin A1c (p = 0.0001) and also attenuates the increased risk of anemia in carriers of sickle cell trait (p interaction= 0.031). We anticipate that accurate calling and imputation of this CNV in additional African American and Hispanic/Latino cohorts will allow powerful interrogation of its main and interaction effects on a wide range of blood cell and other clinical phenotypes.
Back to top