Multi-Omics | NHLBI Trans-Omics for Precision Medicine

Cross-cohort eQTL analyses of 6,602 multi-ancestry TOPMed whole blood RNA-seq samples uncover regulatory relationships

Submitted by	Orchard, Peter
Authors	P, Orchard, F. Aguet, T. Blackwell, K. Ardlie, A. Smith, R. Joehanes, X. Li, M. Wang, C-T. Liu, A. Saferali, J. Wu, M. R. G. Taylor, N. Heard-Costa, H. Tang, P. J. Castaldi, G. Abecasis, J. I. Rotter, L. Kachuri, D. Levy, L. M. Raffield, L. J. Scott, S. C. J. Parker, NHLBI TOPMed Consortium
Name and Date of Professional Meeting	ASHG 2023 Annual Meeting (Nov. 1-5, 2023)
Associated paper proposal(s)	Cross-cohort RNA-seq harmonization and analysis of expression and splicing quantitative trait loci in TOPMed
Working Group(s)	Multi-Omics
Abstract Text	Most genetic variants associated with complex traits occur in non-coding genomic regions and are hypothesized to affect gene expression. To identify variants that regulate gene expression, we performed cis- and trans-expression quantitative trait locus (cis/trans-eQTL) analyses using whole blood RNA-seq and whole genome sequencing data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program from 6,602 samples of predominantly European (68%), African (21%) and Indigenous American (10%) ancestry. We hypothesized that many trans signals would overlap cis signals, enabling identification of candidate genes mediating trans effects. At a MAF ≥ 0.01, we identified 19,422 genes with ≥ 1 cis-eQTL (5% FDR; cis-eGenes) and 71,092 total independent cis-eQTLs (SuSiE 95% credible sets). At MAF ≥ 0.05, we identified 1,743 trans-eGenes. Trans-eVariants were enriched for overlap with cis-eQTL credible sets (Fisher’s exact test against MAF-matched variants; odds ratio = 8.4; p = 3x10-45; 30% of unique trans-eVariants overlap cis-eQTL) and cis-eGenes for cis-eQTL overlapping trans-eQTL were 3.5-fold enriched for transcription factor genes (p < 5e-4). 167 variants were trans-eQTL for >1 gene (1,075 total genes). For example, one cis-eVariant for ERN1, which encodes endonuclease IRE1a, was a trans-eVariant or in high LD (>0.9) with a trans-eVariant for thirteen trans-eGenes, including known ERN1 downstream target XBP1 and XBP1 target genes including DNAJB9. ERN1 is a regulator of the endoplasmic reticulum (ER) stress response, and five of the twelve trans-eGenes in the KEGG database were in the “Protein processing in ER” pathway (31.6-fold enrichment; nominal p=1.8x10-7). Among ER response pathways IRE1a-XBP1 is the most highly conserved, with an emerging role in regulation of inflammation and immune response. To identify trans-eGenes that may share multiple signals with a potentially regulatory cis- eGene, we finemapped trans-eQTL signals within the 2Mb window centered on each trans-eGene’s lead trans-eVariant. Within the 2Mb windows, 300 of the 1,743 trans-eGenes had >1 trans-eQTL (2,080 total trans-eQTL credible sets). We found 31 cis-eGenes with > 1 cis-eQTL signal colocalizing with > 1 trans-eQTL signal from at least one trans-eGene (145 unique cis-eGene - trans-eGene pairs). For example, trans-eGene BTN3A3 showed 4 trans-eQTL credible sets that colocalized with 4 cis-eQTL credible sets for its known regulator NLRC5. This example provides proof-of-principle for our hypothesis. In summary, this dataset demonstrates the utility of large eQTL studies to provide insight into regulatory pathways involving trans-eQTLs mediated by cis-eGenes.

Rare variants affecting telomere length and disease identified through multi-omic modeling

Submitted by	Keener, Rebecca
Authors	Rebecca Keener, Taibo Li, François Aguet, Kristin Ardlie, Jerome Rotter, Steven Rich, and Alexis Battle
Name and Date of Professional Meeting	ASHG Conference (November 5-9, 2023)
Associated paper proposal(s)	Watershed multi-omics modeling to identify rare variants affecting telomere length
Working Group(s)	Multi-Omics
Abstract Text	Telomeres protect the ends of linear chromosomes and as humans age, telomere length (TL) decreases. When telomeres become critically short, a senescence or apoptosis signal prevents further telomere loss. Individuals with extremely short TL present with Short Telomere Syndromes (STS), including bone marrow failure and immunodeficiency, while individuals with extremely long TL are predisposed to cancer. To gain insight into TL genetic regulation, prior work from our group and others used genome-wide association studies to examine the role of common genetic variation in TL. This strategy identified novel genes involved in TL regulation, some of which we experimentally validated. However, this approach ignores the effects of rare variation, which can have larger effect sizes and uniquely impact genes under strong constraint. Studies of rare variant effects on TL have improved our understanding of TL biology, but have largely required laborious STS patient pedigree studies. We leveraged TL estimates and rare variant data from the Trans-Omics for Precision Medicine (TOPMed) Program to broadly examine the impact of rare variation on TL. Previously we developed Watershed, a Bayesian hierarchical model, which uses whole genome sequencing with paired multi-omic data (expression, splicing, methylation, and/or protein levels) to prioritize rare variants causing significant disruption of molecular phenotypes. This multi-omic signature generates interpretable hypotheses for coding and non-coding rare variants, providing a posterior probability that the variant causes outlier status for each molecular signal, for example that splicing is disrupted but expression is not. We used data from 5,310 MESA individuals to train Watershed and observed that in 40/86 individuals with extremely short TL (<1% in TOPMed), Watershed prioritized rare variants in at least one gene from a panel of 16 STS genes. The variant with the largest posterior probability (0.984) was predicted to affect expression of TPP1, which encodes a protein critical for TL regulation. We will expand our analysis to another 103,812 TOPMed individuals and incorporate multi-omic data where available. Examination of highly weighted variants in individuals with extreme TL relative to average TL will potentially identify novel genes involved in TL regulation. In addition, we will examine the interplay between TL regulation and multi-omic signals over age (0-98 years old). Finally, we will apply our model to data from STS patients to improve their genetic diagnosis. Together this work has utility in improving STS patient diagnosis and furthering our understanding of the molecular mechanisms governing TL.

Multi-study pQTL analysis of Somascan proteomics in multi-ancestry TOPMed Cohorts

Submitted by	Debban, Catherine
Authors	Catherine L. Debban, Usman Tahir, Katherine Pratte, Jennifer A. Brody, Mikyeong Lee, Claire Guo, Andrew Hill, Jayna Nicholas, Daniel H Katz, Bing Yu, James G. Wilson, Honghuang Lin, Katerina Kechris, Sina A. Gharib, Stephen S. Rich, Kent Taylor, Michael H. Cho, Jerome I Rotter, Bruce Psaty, Stephanie J London, Robert Gerszten, Laura Raffield, Russell P. Bowler, Ani Manichaikul
Name and Date of Professional Meeting	ASHG Meeting (November 1-5 2023)
Associated paper proposal(s)	Multi-study single variant pQTL analysis of SOMAscan and Olink proteomics in TOPMed Cohorts
Working Group(s)	Multi-Omics
Abstract Text	Integration of genome-wide association study (GWAS) with gene expression quantitative trait loci (eQTL) has proven a valuable approach as a first step to identifying molecular mechanisms underlying GWAS signals. However, many GWAS loci do not show evidence of colocalization with eQTLs. Motivated by the hypothesis that high-throughput proteomics can complement eQTLs for enhanced interpretation of GWAS signals, we assembled a pQTL resource by combining SomaScan proteomics versions with 1.3k, 5k and 7k aptamers measured from four community-based cohorts (Cardiovascular Health Study [CHS], Framingham Heart Study [FHS], Jackson Heart Study [JHS], Multi-Ethnic Study of Atherosclerosis [MESA]; total n=8,200), one smoking-enriched cohort (COPDGene; n=5,000), and one asthma-enriched cohort (the Agricultural Lung Health Study [ALHS]; n=1,830). The combined set of proteomics measures reflects multi-ancestry individuals representing European Americans (EUR; n=7,470) and African Americans (AFA; n=7,200) with 1,300-7,000 protein aptamer measures per sample (depending on the SomaScan version). We leveraged whole genome sequence data for the TOPMed cohorts (CHS, FHS, JHS, MESA and COPDGene) and genome-wide imputation from TOPMed for ALHS to peform pQTL mapping. We found that that accounting for unknown sources of variance by including PEER factors or PCs of hidden variance as covariates improves detection of pQTLs, with the PCs achieving similar results at far lower computational burden. Thus far, preliminary analysis of selected proteins with data from all studies, adjusting for age, sex, PCs of ancestry, and PCs of hidden variance recapitulates known variant-expression associations such as the known SERPINA1 S and Z alleles for alpha-1 antitrypsin levels (AAT) levels. In a subset of 25 proteins on chromosome 21, we detected cis-pQTLs for 52% of proteins, and trans-pQTLs for 44% of proteins. In analysis stratified by race/ancestry, we observed a greater number of protein-associated signals in AFA compared to EUR, likely reflecting differences in patterns of linkage disequilibrium and deeper variation in the African ancestry populations. We are currently expanding our analysis genome-wide. Our pQTL mapping effort leveraging high-throughput proteomics demonstrates the value of integrating multi-ancestry samples to expand the set of protein-associated variants and identify putative molecular mechanisms underlying GWAS signals.

Multi-ancestry transcriptome predictions with functionally informed variants improve transcriptome-wide association studies in TOPMed MESA

Submitted by	Hu, Xiaowei
Authors	Xiaowei Hu, Daniel S. Araujo, Chachrit Khunsriraksakul, Lida Wang, Quan Sun, Jia Wen, Lynette Ekunwe, Lingbo Zhou, Anya Mikhaylova, Kevin L Keys, Leslie A Lange, Ethan Lange, Stephen B Montgomery, Alex P Reiner, Stephen S Rich, Jerome I Rotter, Francois Aguet, Tuuli Lappalainen, Timothy A Thornton, Christopher R Gignoux, Esteban G Bruchard, Kristin G Ardlie, Kent D Taylor, Peter Durda, Elaine Cornell, Xiuqing Guo, Yongmei Liu, Russell P Tracy, Matthew P Conomos, Thomas W Blackwell, George Papanicolaou, W. Craig Johnson, Minoli A Perera, Michael H. Cho, Dajiang Liu, Laura M Raffield, Yun Li, TOPMed Multi-Omics Working Group, Heather Wheeler, Hae Kyung Im, Ani Manichaikul
Name and Date of Professional Meeting	American Society of Human Genetics (November 1-5, 2023)
Associated paper proposal(s)	Integrative multi-SNP prediction of RNA-seq data in the TOPMed MESA Multi-omics pilot
Working Group(s)	Multi-Omics
Abstract Text	BACKGROUND: Reliable prediction of genetically regulated gene expression is key to accurate transcriptome-wide association studies (TWAS). Reference transcriptome prediction models for TWAS have been constructed primarily based on individuals of European ancestry. With the emergence of multi-ancestry GWAS, there is a need for reliable multi-ancestry transcriptome prediction models for downstream TWAS efforts. Furthermore, the genomic variants overlapping with key annotations (e.g., fine-mapping, 3D genomics informed regions, and epigenetic processes) are more likely to be functionally relevant to influence gene expression. METHODS: We developed multi-ancestry transcriptome prediction models with functionally informed variants (FIVs) by leveraging PBMC RNA-seq from 1,287 TOPMed MESA multi-ancestry samples and corresponding whole-genome sequencing data. Then we examined the performance of models on both prediction accuracy and TWAS. We built four prediction models including one benchmark model (Elastic Net, EN), and three models with FIVs, i.e., EN with fine-mapped variants; Prediction Using Models Informed by Chromatin conformations and Epigenomics, PUMICE; and PUMICE with fine-mapped variants. The prediction accuracy of four models was then assessed in Geuvadis cohort using 449 multi-ancestry samples with LCL RNA-seq. To examine model’s performance on TWAS, we leveraged summary statistics from two recent multi-ancestry GWAS, the Global Lipids Genetics Consortium (GLGC) GWAS (N~1.65 million), and lung function GWAS (N=580,869). We then examined TWAS precision by overlapping Bonferroni-significant TWAS genes with previously identified GWAS trait-related putative causal genes (i.e., Mendelian and Mouse knockout genes, genes with coding variants, genes with rare exonic association, nearest genes). RESULTS: While the gene expression prediction accuracy was similar across the four models in both discovery and validation analyses, the TWAS from models with FIVs outperformed. The TWAS from models with FIVs identified more putative causal genes than the TWAS from EN model for three out five lipid traits and for three out of four lung function traits respectively. For example, the TWAS from PUMICE and EN identified 87 and 80 putative causal genes respectively for total cholesterol. Similarly, the TWAS from PUMICE identified 13 putative causal genes for peak expiratory flow rate, while the TWAS from EN only identified 6 genes. SUMMARY: Our study demonstrates the value of including FIVs in multi-ancestry transcriptome prediction models for improving TWAS precision.

Cross-cohort eQTL fine-mapping utilizing TOPMed whole genome sequencing identifies tens of thousands of independent eQTLs signals and thousands of eQTLs colocalizing with complex trait-associated variants

Submitted by	Orchard, Peter
Authors	P. Orchard, F. Aguet, T. Blackwell, K. Ardlie, P. J. Castaldi, A. V. Smith, R. Joehanes,, A. Saferali, H. E. Wheeler, C-T. Liu, M. Cho, C. Hersh, L. Mestroni, L. Kachuri, A. P. Reiner, X. Li, M. Taylor, D. A. Meyers, S. S. Rich, G. Abecasis, N. Heard-Costa, L. J. Scott, J. I. Rotter, H. Tang, D. Levy,, L. M. Raffield, S. C. J. Parker, NHLBI TOPMed Consortium
Name and Date of Professional Meeting	ASHG 2022 Annual Meeting (Oct. 25 - 29, 2022)
Associated paper proposal(s)	Cross-cohort RNA-seq harmonization and analysis of expression and splicing quantitative trait loci in TOPMed
Working Group(s)	Multi-Omics
Abstract Text	Most genetic variants associated with complex traits and diseases occur in non-coding genomic regions and are hypothesized to regulate gene expression. To understand the genetics underlying gene expression variability, we performed cis expression quantitative trait locus (cis-eQTL) analyses using RNA-seq and whole genome sequencing (WGS) data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program from 6,602 whole blood samples of European (EUR; 68%), African (21%) and Indigenous American (10%) ancestry. Notably, this exceeds the sample size of published RNA-seq and WGS-based cis-eQTL analyses, which enabled us to test variants with minor allele frequency (MAF) below 0.01 and detect secondary signals for 15,317 genes. At a MAF≥0.001, we identified 19,381 genes with at least one eQTL (5% FDR, testing variants within 1Mb of the transcription start site; 22,180 genes tested). We fine-mapped independent eQTL signals using the SuSiE method and identified 77,398 eQTL signals (95% credible sets; median 17,183 variants tested per gene and 3 credible sets discovered per gene), including 31,810 credible sets containing a single variant. By contrast, restricting to variants with higher MAF (MAF≥0.01), we identified 70,943 eQTL signals (median 7,953 variants tested per gene and 3 credible sets discovered per gene), and 29,690 95% credible sets containing a single variant. To assess the utility of this dataset to identify target genes and nominate causal variants for genome wide association study (GWAS) signals, we colocalized independent cis-eQTL signals with 33,141 fine-mapped EUR GWAS signals from 172 UK Biobank traits. 5,782 GWAS signals colocalized with an eQTL (SuSiE-coloc PP4 posterior probability of colocalization > 0.8). Of these, 1,648 GWAS signals colocalized with an eQTL from more than one gene. Of 4,134 GWAS signals colocalizing with only one gene, in 52% of cases the gene was not the nearest gene. 2,910 of the 5,782 colocalizing GWAS loci colocalized with only secondary eQTL signals. We identified 215 instances in which multiple neighboring GWAS signals for a given trait colocalized with multiple eQTLs from the same gene. For example, in one 843kb window we identified six independent GWAS signals for neutrophil percentage, three of which are in or near ACKR1 (previously shown to regulate neutrophil counts) and colocalize with three independent ACKR1 eQTL signals (each with a single variant 95% eQTL credible set). In summary, this dataset demonstrates the utility of large-scale WGS-based eQTL studies to map genetic regulatory effects on gene expression at unprecedented resolution and nominate causal genes for thousands of GWAS signals.