Skip to main content

Cross-cohort eQTL fine-mapping utilizing TOPMed whole genome sequencing identifies tens of thousands of independent eQTLs signals and thousands of eQTLs colocalizing with complex trait-associated variants

Authors
P. Orchard, F. Aguet, T. Blackwell, K. Ardlie, P. J. Castaldi, A. V. Smith, R. Joehanes,, A. Saferali, H. E. Wheeler, C-T. Liu, M. Cho, C. Hersh, L. Mestroni, L. Kachuri, A. P. Reiner, X. Li, M. Taylor, D. A. Meyers, S. S. Rich, G. Abecasis, N. Heard-Costa, L. J. Scott, J. I. Rotter, H. Tang, D. Levy,, L. M. Raffield, S. C. J. Parker, NHLBI TOPMed Consortium
Name and Date of Professional Meeting
ASHG 2022 Annual Meeting (Oct. 25 - 29, 2022)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Most genetic variants associated with complex traits and diseases occur in non-coding genomic regions and are hypothesized to regulate gene
expression. To understand the genetics underlying gene expression variability, we performed cis expression quantitative trait locus (cis-eQTL)
analyses using RNA-seq and whole genome sequencing (WGS) data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program from
6,602 whole blood samples of European (EUR; 68%), African (21%) and Indigenous American (10%) ancestry. Notably, this exceeds the sample
size of published RNA-seq and WGS-based cis-eQTL analyses, which enabled us to test variants with minor allele frequency (MAF) below 0.01
and detect secondary signals for 15,317 genes.

At a MAF≥0.001, we identified 19,381 genes with at least one eQTL (5% FDR, testing variants within 1Mb of the transcription start site; 22,180
genes tested). We fine-mapped independent eQTL signals using the SuSiE method and identified 77,398 eQTL signals (95% credible sets; median
17,183 variants tested per gene and 3 credible sets discovered per gene), including 31,810 credible sets containing a single variant. By contrast,
restricting to variants with higher MAF (MAF≥0.01), we identified 70,943 eQTL signals (median 7,953 variants tested per gene and 3 credible
sets discovered per gene), and 29,690 95% credible sets containing a single variant.

To assess the utility of this dataset to identify target genes and nominate causal variants for genome wide association study (GWAS) signals, we
colocalized independent cis-eQTL signals with 33,141 fine-mapped EUR GWAS signals from 172 UK Biobank traits. 5,782 GWAS signals
colocalized with an eQTL (SuSiE-coloc PP4 posterior probability of colocalization > 0.8). Of these, 1,648 GWAS signals colocalized with an eQTL
from more than one gene. Of 4,134 GWAS signals colocalizing with only one gene, in 52% of cases the gene was not the nearest gene. 2,910 of
the 5,782 colocalizing GWAS loci colocalized with only secondary eQTL signals. We identified 215 instances in which multiple neighboring GWAS
signals for a given trait colocalized with multiple eQTLs from the same gene. For example, in one 843kb window we identified six independent
GWAS signals for neutrophil percentage, three of which are in or near ACKR1 (previously shown to regulate neutrophil counts) and colocalize
with three independent ACKR1 eQTL signals (each with a single variant 95% eQTL credible set).

In summary, this dataset demonstrates the utility of large-scale WGS-based eQTL studies to map genetic regulatory effects on gene expression
at unprecedented resolution and nominate causal genes for thousands of GWAS signals.
Back to top