Abstract Text |
Introduction
Whole genome sequencing (WGS) studies have cumulatively identified hundreds of millions of rare variants, the majority of which are in non-coding regions and of unknown function. Given this large number of genetic variants, existing methods for gene-centric Rare Variant Association Tests (RVATs) in WGS studies have identified relatively few associations between candidate Cis-Regulatory Elements (cCREs) and complex human diseases. Because the regulatory landscape of many cCREs varies across cell types, it is of substantial interest to incorporate single-cell sequencing data into RVATs to capture the functional variability that exists across cell types in the non-coding genome and boost statistical power in the process.
Methods
We propose cellSTAAR to address two opportunities to improve existing gene-centric RVAT methods as applied to genetic variants in cCREs. First, cellSTAAR integrates single-cell ATAC-seq data to capture variability in chromatin accessibility across cell types via the construction of cell-type-specific variant sets and the upweighting of relevant variants using cell-type-specific functional annotations. Second, cellSTAAR links cCREs to their target genes using an omnibus framework that aggregates results from a variety of linking approaches, each of which uses differing kinds of genomic data and computational approaches, to reflect the uncertainty in element-gene linking. We applied cellSTAAR on Freeze 8 (N = 60,000) of the NHLBI Trans-Omics for Precision Medicine (TOPMed) consortium data to three quantitative lipids traits: LDL, HDL, and TG.
Results
In at least one cell type, genome-wide significant promoter and enhancer associations were found in several known lipids loci, including APOE, APOA1, and CETP. Critically, unlike existing methods, cellSTAAR reveals variability in the significance of these loci across a variety of cell types and uncertainty in the target gene for significant enhancers. For example, out of 19 cell types analyzed, the significant enhancer near APOE was found in only 6 cell types. Included in these 6 are 5 cell types known a priori to be highly relevant to lipids: hepatocytes, fetal hepatoblasts, adipocytes, liver endothelial cells, and enterocytes from the small intestine. Although the associated enhancer is contained with the APOE gene, 3D-based evidence from SCREEN suggests possible regulation of nearby genes APOC2 and APOC4. This uncertainty in target gene is not reflected in existing RVAT methods. Using a weakened genome-wide significance threshold, the most discoveries using cellSTAAR are found in cell types that are the most relevant to lipids such as those mentioned above.
Conclusions
We propose a new statistical method, cellSTAAR, to integrate single-cell sequencing data into gene-centric RVATs of candidate enhancer and promoter regions. When applied to three quantitative lipids traits from the TOPMed consortium, cellSTAAR produces replicated discoveries in known genes, reveals variability in significance across cell types, and allows us to investigate the impact of the uncertain links between regulatory elements and their target genes.
|