Skip to main content

Multi-Omics

Omic Risk Scores are Associated with Cross-Sectional and Longitudinal Chronic Obstructive Pulmonary Disease-Related Traits Across Three Cohorts

Authors
I. R. Konigsberg, L. B. Vargas, K. A. Pratte, K. Buschur, D. E. Guzman, T. D. Pottinger, A. Manichaikul, E. C. Oelsner, E. R. Bleecker, D. A. Meyers, V. E. Ortega, S. A. Christenson, D. L. Demeo, B. D. Hobbs, C. P. Hersh, P. J. Castaldi, J. L. Curtis, R. G. Barr, J. I. Rotter, S. S. Rich, P. G. Woodruff, E. K. Silverman, M. H. Cho, K. J. Kechris, R. P. Bowler, E. M. Lange, L. A. Lange, M. R. Moll
Name and Date of Professional Meeting
American Society of Human Genetics (November 5-9, 2024)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Background: Individuals with chronic obstructive pulmonary disease (COPD) demonstrate marked heterogeneity with respect to lung function decline, emphysema, mortality, exacerbations, and other disease-related outcomes. Omic risk scores (ORS) estimate the cumulative contribution of omics, such as the transcriptome, proteome, and metabolome, to a particular trait. In this study, we aimed to assess the predictive value of ORS for COPD-related traits in both smoking-enriched and general population cohorts.
Methods: We developed and tested ORS in n=3,339 participants of the Genetic Epidemiology of COPD (COPDGene) study with blood RNA-sequencing, proteomic, and metabolomic data collected at the second study visit. On 80% of the data, we trained single- and multi-omic risk scores on a variety of traits using elastic net penalized regression with 10-fold cross-validation. We included 24 cross-sectional and 5 longitudinal traits (where the trait was measured approximately 5 years apart), enriched for measures of disease severity, exacerbations, and traits derived from spirometry and computed tomography scans. We used multivariable models to test association of ORS with outcomes in a held-out COPDGene testing set and externally validated findings in participants of SubPopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS) (n= 2,177) and Multi-Ethnic Study of Atherosclerosis (MESA) (n=1,000), adjusting for potential confounders and multiple testing.
Results: Among the 24 cross-sectional traits in the COPDGene testing set, there were significantly associations with 70 of 72 single-omic ORS (false discovery rate adjusted p-value < 0.05). Significant associations were found in 5 of 15 longitudinal ORS with changes in trait values between COPDGene visits, including with forced expiratory volume at one second (FEV1) decline over 5 years and annually. We observed significant association with the relevant traits for all 38 cross-sectional ORS tested in SPIROMICS and for 15 of 24 in MESA. Generally, proteomic and metabolomic risk scores displayed stronger trait associations than transcriptomic risk scores, and multi-omic risk scores had higher predictive capacity than single-omic risk scores.
Conclusions: ORS constructed from blood-based omics can be leveraged to predict cross-sectional and future COPD-related traits in both smoking-enriched and general population cohorts. ORS for clinical use would require phenotype-focused risk score construction and replication.

Integration of Multi-Omics Data & eQTL Summary Statistics with omicsSTAAR Boosts Power in Rare Variant Association Tests of Non-Coding Regions

Authors
Eric Van Buren, Xihao Li, Zilin Li, Peter Orchard, Hufeng Zhou, Alex Reiner, Laura Raffield, Xihong Lin, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Multi-Omics Working Group
Name and Date of Professional Meeting
ASHG 2024 (November 7, 2024)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Introduction
Through our previously developed cellSTAAR method, we demonstrated that integration of single-cell-sequencing-based epigenetic data can boost the power of gene-centric Rare Variant (RV) association tests (RVATs) to detect associations of candidate Cis-Regulatory Elements (cCREs) in complex human diseases. Integrating additional kinds of multi-omics data to capture additional sources of functional variability that exists in the non-coding genome may further increase power.
Methods
We propose omicsSTAAR as a new method to robustly integrate several kinds of multi-omics data into gene-centric RVATs of non-coding regions. First, omicsSTAAR can integrate variant- level multi-omics datasets, such as from methylation studies or eQTL summary statistics, to create custom variant sets of the most likely causal variants weighted with corresponding functional annotations. Association p-values from each variant set are aggregated using the Cauchy Combination Test to create an omnibus p-value summarizing evidence across different categories of multi-omics data. Second, omicsSTAAR can integrate gene-level multi-omics datasets, such as RNA-seq and proteomics experiments, to weight omnibus gene-centric association p-values using “side-information” approaches such as Independent Hypothesis Weighting (IHW). Using such approaches, omicsSTAAR can account for the biological relevance of each gene as measured by expression or protein abundance in relevant tissues.
Results
We applied omicsSTAAR on Freeze 8 (N = 60,000) of the NHLBI Trans-Omics for Precision Medicine (TOPMed) consortium data of four hematological traits: hemoglobin (HGB), hematocrit (HCT), platelet count (PLT), and white blood cell count (WBC). To demonstrate omicsSTAAR, we collected single-cell ATAC-seq data and two TOPMed blood-based datasets: RNA-seq from the WHI and FHS TOPMed cohorts (N = 2,072) and eQTL summary statistics based on 5,007 TOPMed participants. Our analysis reveals associations in several known genes for hematological traits, including HBQ1 and CD84, while showing variability in the which kinds of omics data detect each association. We also demonstrate a substantial increase in the number of discoveries at a reduced significance threshold when combining the variant-level multi-omics data (scATAC-seq and eQTL summary statistics) association results into an omnibus association p-value and when using gene-level multi-omics data (RNA-seq) to weight the gene-centric omnibus p-values.

Plasma Proteomic Determinants of Small Vessel Disease of the Brain: the Multi-Ethnic Study of Atherosclerosis

Authors
Rizwan Kalani, Alison E. Fohner, Thomas R. Austin, Sheina Emrani, Paul N. Jensen, Timothy Hughes, Alexis C. Wood, Alain Bertoni, Sanjiv Shah, Mohamad Habes, Tanweer Rashid, Sokratis Charisis, Keenan Walker, W.T. Longstreth, Jr, David L. Tirschwell, Bruce M. Psaty, James S. Floyd, Usman A. Tahir, Robert E. Gerszten, Jerome I. Rotter, Stephen S. Rich, Susan R. Heckbert
Name and Date of Professional Meeting
Alzheimer's Association International Conference (July 28-Aug 1, 2024)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Background: The identification of novel blood-based biomarkers of small vessel disease of the brain (SVD) may improve pathophysiologic understanding and inform the development of new therapeutic strategies for prevention. We evaluated plasma proteomic associations of white matter fractional anisotropy (WMFA), white matter hyperintensity (WMH) volume, enlarged perivascular space (ePVS) volume, and the presence of microbleeds (MB) on brain magnetic resonance imaging (MRI) in the population-based Multi-Ethnic Study of Atherosclerosis (MESA).

Methods: Eligible MESA participants underwent measurement of 2941 plasma proteins with the antibody-based Olink proteomics platform from blood samples collected in 2016-2018 and completed brain MRI scans in 2018-2019. Participants with quality control exclusion of protein measurements, missing covariate data, and poor quality or missing MRI outcome variables were excluded. The cross-sectional association between the abundance of each plasma protein (normalized protein expression – a relative protein quantification unit measured on a log2 scale) was modeled separately with WMFA, WMH volume, total ePVS volume, and the presence of MBs using multivariable linear or modified Poisson regression, adjusting for demographic variables, estimated glomerular filtration rate (eGFR), and SVD risk factors. For proteins independently associated with the SVD markers on MRI, penalized regression with least absolute shrinkage and selection operator (LASSO) was used to create a parsimonious proteomic model. The Benjamini-Hochberg procedure was used to control the false discovery rate <0.05 to account for multiple hypothesis testing.

Results: Eligible participants (total N=709) had a mean age of 73 years, 53% were women, 25% were Black, 17% were Chinese, 19% were Hispanic or Latino, and 39% were White (Table 1). After adjustment for demographics, eGFR, and SVD risk factors, 769 plasma proteins were associated with WMFA (Figure). LASSO regression identified a 37-protein model predictive of WMFA (Table 2). We did not find plasma proteins to be independently associated with WMH volume, ePVS volume, or the presence of MBs.

Conclusion: Multiple circulating proteins – implicated in central nervous system myelination, lipid metabolism, angiogenesis, coagulation, cellular adhesion and migration, appetite regulation, energy homeostasis, systemic inflammation, and immune regulation – were independently associated with WMFA in a multi-ethnic cohort of older adults.

Gene Expression and Splicing QTL Analysis of Blood Cells in African American Participants from the Jackson Heart Study

Authors
Jia Wen, Quan Sun, Le Huang, Lingbo Zhou, Margaret F. Doyle, Lynette Ekunwe, Nels C. Olson, Alexander P. Reiner, Yun Li,* Laura M. Raffield*
Name and Date of Professional Meeting
2023 ASHG Annual Meeting, November 1-5, 2023
Associated paper proposal(s)
Working Group(s)
Abstract Text
Most gene expression and alternative splicing quantitative trait loci (eQTL/sQTL) studies have been biased toward European ancestry individuals. Here, we performed eQTL and sQTL analysis using TOPMed whole genome sequencing-derived genotype data and RNA sequencing data from stored peripheral blood mononuclear cells in 1,012 African American participants from the Jackson Heart Study (JHS). At a false discovery rate (FDR) of 5%, we identified 4,798,604 significant eQTL-gene pairs, covering 16,538 unique genes; and 5,921,368 sQTL-gene-cluster pairs, covering 9,605 unique genes. About 31% of detected eQTL and sQTL variants with a minor allele frequency (MAF) > 1% in JHS were rare (MAF < 0.1%), and therefore unlikely to be detected, in European ancestry individuals. We also generated 17,630 eQTL credible sets and 24,525 sQTL credible sets for genes (gene-clusters) with lead QTL p < 5e-8. Finally, we created an open database, which is freely available online (http://jhsqtl.genetics.unc.edu/), allowing fast query and bulk download of our QTL results.

Cross-cohort eQTL analyses of 6,602 multi-ancestry TOPMed whole blood RNA-seq samples uncover regulatory relationships

Authors
P, Orchard, F. Aguet, T. Blackwell, K. Ardlie, A. Smith, R. Joehanes, X. Li, M. Wang, C-T. Liu, A. Saferali, J. Wu, M. R. G. Taylor, N. Heard-Costa, H. Tang, P. J. Castaldi, G. Abecasis, J. I. Rotter, L. Kachuri, D. Levy, L. M. Raffield, L. J. Scott, S. C. J. Parker, NHLBI TOPMed Consortium
Name and Date of Professional Meeting
ASHG 2023 Annual Meeting (Nov. 1-5, 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Most genetic variants associated with complex traits occur in non-coding genomic regions and are hypothesized to affect gene expression. To identify variants that regulate gene expression, we performed cis- and trans-expression quantitative trait locus (cis/trans-eQTL) analyses using whole blood RNA-seq and whole genome sequencing data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program from 6,602 samples of predominantly European (68%), African (21%) and Indigenous American (10%) ancestry. We hypothesized that many trans signals would overlap cis signals, enabling identification of candidate genes mediating trans effects.
At a MAF ≥ 0.01, we identified 19,422 genes with ≥ 1 cis-eQTL (5% FDR; cis-eGenes) and 71,092 total independent cis-eQTLs (SuSiE 95% credible sets). At MAF ≥ 0.05, we identified 1,743 trans-eGenes. Trans-eVariants were enriched for overlap with cis-eQTL credible sets (Fisher’s exact test against MAF-matched variants; odds ratio = 8.4; p = 3x10-45; 30% of unique trans-eVariants overlap cis-eQTL) and cis-eGenes for cis-eQTL overlapping trans-eQTL were 3.5-fold enriched for transcription factor genes (p < 5e-4). 167 variants were trans-eQTL for >1 gene (1,075 total genes). For example, one cis-eVariant for ERN1, which encodes endonuclease IRE1a, was a trans-eVariant or in high LD (>0.9) with a trans-eVariant for thirteen trans-eGenes, including known ERN1 downstream target XBP1 and XBP1 target genes including DNAJB9. ERN1 is a regulator of the endoplasmic reticulum (ER) stress response, and five of the twelve trans-eGenes in the KEGG database were in the “Protein processing in ER” pathway (31.6-fold enrichment; nominal p=1.8x10-7). Among ER response pathways IRE1a-XBP1 is the most highly conserved, with an emerging role in regulation of inflammation and immune response.
To identify trans-eGenes that may share multiple signals with a potentially regulatory cis- eGene, we finemapped trans-eQTL signals within the 2Mb window centered on each trans-eGene’s lead trans-eVariant. Within the 2Mb windows, 300 of the 1,743 trans-eGenes had >1 trans-eQTL (2,080 total trans-eQTL credible sets). We found 31 cis-eGenes with > 1 cis-eQTL signal colocalizing with > 1 trans-eQTL signal from at least one trans-eGene (145 unique cis-eGene - trans-eGene pairs). For example, trans-eGene BTN3A3 showed 4 trans-eQTL credible sets that colocalized with 4 cis-eQTL credible sets for its known regulator NLRC5. This example provides proof-of-principle for our hypothesis.
In summary, this dataset demonstrates the utility of large eQTL studies to provide insight into regulatory pathways involving trans-eQTLs mediated by cis-eGenes.

Rare variants affecting telomere length and disease identified through multi-omic modeling

Authors
Rebecca Keener, Taibo Li, François Aguet, Kristin Ardlie, Jerome Rotter, Steven Rich, and Alexis Battle
Name and Date of Professional Meeting
ASHG Conference (November 5-9, 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Telomeres protect the ends of linear chromosomes and as humans age, telomere length (TL) decreases. When telomeres become critically short, a senescence or apoptosis signal prevents further telomere loss. Individuals with extremely short TL present with Short Telomere Syndromes (STS), including bone marrow failure and immunodeficiency, while individuals with extremely long TL are predisposed to cancer. To gain insight into TL genetic regulation, prior work from our group and others used genome-wide association studies to examine the role of common genetic variation in TL. This strategy identified novel genes involved in TL regulation, some of which we experimentally validated. However, this approach ignores the effects of rare variation, which can have larger effect sizes and uniquely impact genes under strong constraint.

Studies of rare variant effects on TL have improved our understanding of TL biology, but have largely required laborious STS patient pedigree studies. We leveraged TL estimates and rare variant data from the Trans-Omics for Precision Medicine (TOPMed) Program to broadly examine the impact of rare variation on TL. Previously we developed Watershed, a Bayesian hierarchical model, which uses whole genome sequencing with paired multi-omic data (expression, splicing, methylation, and/or protein levels) to prioritize rare variants causing significant disruption of molecular phenotypes. This multi-omic signature generates interpretable hypotheses for coding and non-coding rare variants, providing a posterior probability that the variant causes outlier status for each molecular signal, for example that splicing is disrupted but expression is not. We used data from 5,310 MESA individuals to train Watershed and observed that in 40/86 individuals with extremely short TL (<1% in TOPMed), Watershed prioritized rare variants in at least one gene from a panel of 16 STS genes. The variant with the largest posterior probability (0.984) was predicted to affect expression of TPP1, which encodes a protein critical for TL regulation.

We will expand our analysis to another 103,812 TOPMed individuals and incorporate multi-omic data where available. Examination of highly weighted variants in individuals with extreme TL relative to average TL will potentially identify novel genes involved in TL regulation. In addition, we will examine the interplay between TL regulation and multi-omic signals over age (0-98 years old). Finally, we will apply our model to data from STS patients to improve their genetic diagnosis. Together this work has utility in improving STS patient diagnosis and furthering our understanding of the molecular mechanisms governing TL.
Back to top