Skip to main content

Multi-Omics

Multi-study pQTL analysis of Somascan proteomics in multi-ancestry TOPMed Cohorts

Authors
Catherine L. Debban, Usman Tahir, Katherine Pratte, Jennifer A. Brody, Mikyeong Lee, Claire Guo, Andrew Hill, Jayna Nicholas, Daniel H Katz, Bing Yu, James G. Wilson, Honghuang Lin, Katerina Kechris, Sina A. Gharib, Stephen S. Rich, Kent Taylor, Michael H. Cho, Jerome I Rotter, Bruce Psaty, Stephanie J London, Robert Gerszten, Laura Raffield, Russell P. Bowler, Ani Manichaikul
Name and Date of Professional Meeting
ASHG Meeting (November 1-5 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Integration of genome-wide association study (GWAS) with gene expression quantitative trait loci (eQTL) has proven a valuable approach as a first step to identifying molecular mechanisms underlying GWAS signals. However, many GWAS loci do not show evidence of colocalization with eQTLs. Motivated by the hypothesis that high-throughput proteomics can complement eQTLs for enhanced interpretation of GWAS signals, we assembled a pQTL resource by combining SomaScan proteomics versions with 1.3k, 5k and 7k aptamers measured from four community-based cohorts (Cardiovascular Health Study [CHS], Framingham Heart Study [FHS], Jackson Heart Study [JHS], Multi-Ethnic Study of Atherosclerosis [MESA]; total n=8,200), one smoking-enriched cohort (COPDGene; n=5,000), and one asthma-enriched cohort (the Agricultural Lung Health Study [ALHS]; n=1,830). The combined set of proteomics measures reflects multi-ancestry individuals representing European Americans (EUR; n=7,470) and African Americans (AFA; n=7,200) with 1,300-7,000 protein aptamer measures per sample (depending on the SomaScan version). We leveraged whole genome sequence data for the TOPMed cohorts (CHS, FHS, JHS, MESA and COPDGene) and genome-wide imputation from TOPMed for ALHS to peform pQTL mapping. We found that that accounting for unknown sources of variance by including PEER factors or PCs of hidden variance as covariates improves detection of pQTLs, with the PCs achieving similar results at far lower computational burden. Thus far, preliminary analysis of selected proteins with data from all studies, adjusting for age, sex, PCs of ancestry, and PCs of hidden variance recapitulates known variant-expression associations such as the known SERPINA1 S and Z alleles for alpha-1 antitrypsin levels (AAT) levels. In a subset of 25 proteins on chromosome 21, we detected cis-pQTLs for 52% of proteins, and trans-pQTLs for 44% of proteins. In analysis stratified by race/ancestry, we observed a greater number of protein-associated signals in AFA compared to EUR, likely reflecting differences in patterns of linkage disequilibrium and deeper variation in the African ancestry populations. We are currently expanding our analysis genome-wide. Our pQTL mapping effort leveraging high-throughput proteomics demonstrates the value of integrating multi-ancestry samples to expand the set of protein-associated variants and identify putative molecular mechanisms underlying GWAS signals.

Multi-ancestry transcriptome predictions with functionally informed variants improve transcriptome-wide association studies in TOPMed MESA

Authors
Xiaowei Hu, Daniel S. Araujo, Chachrit Khunsriraksakul, Lida Wang, Quan Sun, Jia Wen, Lynette Ekunwe, Lingbo Zhou, Anya Mikhaylova, Kevin L Keys, Leslie A Lange, Ethan Lange, Stephen B Montgomery, Alex P Reiner, Stephen S Rich, Jerome I Rotter, Francois Aguet, Tuuli Lappalainen, Timothy A Thornton, Christopher R Gignoux, Esteban G Bruchard, Kristin G Ardlie, Kent D Taylor, Peter Durda, Elaine Cornell, Xiuqing Guo, Yongmei Liu, Russell P Tracy, Matthew P Conomos, Thomas W Blackwell, George Papanicolaou, W. Craig Johnson, Minoli A Perera, Michael H. Cho, Dajiang Liu, Laura M Raffield, Yun Li, TOPMed Multi-Omics Working Group, Heather Wheeler, Hae Kyung Im, Ani Manichaikul
Name and Date of Professional Meeting
American Society of Human Genetics (November 1-5, 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
BACKGROUND: Reliable prediction of genetically regulated gene expression is key to accurate transcriptome-wide association studies (TWAS). Reference transcriptome prediction models for TWAS have been constructed primarily based on individuals of European ancestry. With the emergence of multi-ancestry GWAS, there is a need for reliable multi-ancestry transcriptome prediction models for downstream TWAS efforts. Furthermore, the genomic variants overlapping with key annotations (e.g., fine-mapping, 3D genomics informed regions, and epigenetic processes) are more likely to be functionally relevant to influence gene expression.
METHODS: We developed multi-ancestry transcriptome prediction models with functionally informed variants (FIVs) by leveraging PBMC RNA-seq from 1,287 TOPMed MESA multi-ancestry samples and corresponding whole-genome sequencing data. Then we examined the performance of models on both prediction accuracy and TWAS. We built four prediction models including one benchmark model (Elastic Net, EN), and three models with FIVs, i.e., EN with fine-mapped variants; Prediction Using Models Informed by Chromatin conformations and Epigenomics, PUMICE; and PUMICE with fine-mapped variants. The prediction accuracy of four models was then assessed in Geuvadis cohort using 449 multi-ancestry samples with LCL RNA-seq. To examine model’s performance on TWAS, we leveraged summary statistics from two recent multi-ancestry GWAS, the Global Lipids Genetics Consortium (GLGC) GWAS (N~1.65 million), and lung function GWAS (N=580,869). We then examined TWAS precision by overlapping Bonferroni-significant TWAS genes with previously identified GWAS trait-related putative causal genes (i.e., Mendelian and Mouse knockout genes, genes with coding variants, genes with rare exonic association, nearest genes).
RESULTS: While the gene expression prediction accuracy was similar across the four models in both discovery and validation analyses, the TWAS from models with FIVs outperformed. The TWAS from models with FIVs identified more putative causal genes than the TWAS from EN model for three out five lipid traits and for three out of four lung function traits respectively. For example, the TWAS from PUMICE and EN identified 87 and 80 putative causal genes respectively for total cholesterol. Similarly, the TWAS from PUMICE identified 13 putative causal genes for peak expiratory flow rate, while the TWAS from EN only identified 6 genes.
SUMMARY: Our study demonstrates the value of including FIVs in multi-ancestry transcriptome prediction models for improving TWAS precision.

Cross-cohort eQTL fine-mapping utilizing TOPMed whole genome sequencing identifies tens of thousands of independent eQTLs signals and thousands of eQTLs colocalizing with complex trait-associated variants

Authors
P. Orchard, F. Aguet, T. Blackwell, K. Ardlie, P. J. Castaldi, A. V. Smith, R. Joehanes,, A. Saferali, H. E. Wheeler, C-T. Liu, M. Cho, C. Hersh, L. Mestroni, L. Kachuri, A. P. Reiner, X. Li, M. Taylor, D. A. Meyers, S. S. Rich, G. Abecasis, N. Heard-Costa, L. J. Scott, J. I. Rotter, H. Tang, D. Levy,, L. M. Raffield, S. C. J. Parker, NHLBI TOPMed Consortium
Name and Date of Professional Meeting
ASHG 2022 Annual Meeting (Oct. 25 - 29, 2022)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Most genetic variants associated with complex traits and diseases occur in non-coding genomic regions and are hypothesized to regulate gene
expression. To understand the genetics underlying gene expression variability, we performed cis expression quantitative trait locus (cis-eQTL)
analyses using RNA-seq and whole genome sequencing (WGS) data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program from
6,602 whole blood samples of European (EUR; 68%), African (21%) and Indigenous American (10%) ancestry. Notably, this exceeds the sample
size of published RNA-seq and WGS-based cis-eQTL analyses, which enabled us to test variants with minor allele frequency (MAF) below 0.01
and detect secondary signals for 15,317 genes.

At a MAF≥0.001, we identified 19,381 genes with at least one eQTL (5% FDR, testing variants within 1Mb of the transcription start site; 22,180
genes tested). We fine-mapped independent eQTL signals using the SuSiE method and identified 77,398 eQTL signals (95% credible sets; median
17,183 variants tested per gene and 3 credible sets discovered per gene), including 31,810 credible sets containing a single variant. By contrast,
restricting to variants with higher MAF (MAF≥0.01), we identified 70,943 eQTL signals (median 7,953 variants tested per gene and 3 credible
sets discovered per gene), and 29,690 95% credible sets containing a single variant.

To assess the utility of this dataset to identify target genes and nominate causal variants for genome wide association study (GWAS) signals, we
colocalized independent cis-eQTL signals with 33,141 fine-mapped EUR GWAS signals from 172 UK Biobank traits. 5,782 GWAS signals
colocalized with an eQTL (SuSiE-coloc PP4 posterior probability of colocalization > 0.8). Of these, 1,648 GWAS signals colocalized with an eQTL
from more than one gene. Of 4,134 GWAS signals colocalizing with only one gene, in 52% of cases the gene was not the nearest gene. 2,910 of
the 5,782 colocalizing GWAS loci colocalized with only secondary eQTL signals. We identified 215 instances in which multiple neighboring GWAS
signals for a given trait colocalized with multiple eQTLs from the same gene. For example, in one 843kb window we identified six independent
GWAS signals for neutrophil percentage, three of which are in or near ACKR1 (previously shown to regulate neutrophil counts) and colocalize
with three independent ACKR1 eQTL signals (each with a single variant 95% eQTL credible set).

In summary, this dataset demonstrates the utility of large-scale WGS-based eQTL studies to map genetic regulatory effects on gene expression
at unprecedented resolution and nominate causal genes for thousands of GWAS signals.
Back to top