Skip to main content

Lipids

A framework for detecting non-coding rare variant associations in large whole genome sequencing studies at scale, with application to 30,138 TOPMed participants for lipid traits

Authors
Zilin Li, Xihao Li, Hufeng Zhou, Sheila M. Gaynor, Margaret Sunitha, Akhil Pampana, Jerome I. Rotter, Cristen J. Willer, Gina M. Peloso, Pradeep Natarajan and Xihong Lin, on behalf of the TOPMed Lipids Working Group
Name and Date of Professional Meeting
ASHG Conference (October 27-31)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Introduction
Compared with GWAS and whole exome sequencing studies, large-scale whole genome sequencing studies have enabled the analysis of non-coding rare variants (RVs) associated with complex human traits. Common analytic strategies for RV association in non-coding region considered limited choices of gene-centric masks and sliding windows of a fixed length, and have limited scope to leverage the functions of variants.
Methods
We propose a non-coding rare variant association detection framework, including gene-centric analysis and genetic region analysis. For gene-centric analysis, we consider various strategies for grouping non-coding variants based on functional annotations, including UTR, upstream, downstream, promoter, enhancer and long non-coding RNA genes. We apply an epi-genetic filtering method to increase the signal to noise for RV analysis of enhancers by considering a rank ordered list of genes, with the top ones more likely to be related to lipid phenotype in liver and adipose. For genetic region analysis, we group non-coding RVs residing in a contiguous window, defined either by a pre-specified (fixed) window size or a flexible data-adaptive window size using SCANG (SCAN the Genome). The STAAR (variant-Set Test for Association using Annotation infoRmation) method is also applied in the framework that increases the power of RV association tests by effectively incorporating multiple functional annotations.

We applied the proposed framework to analyze non-coding RV association with four quantitative lipid traits (LDL-C, HDL-C, TG and TC) in 21,015 discovery samples and 9,123 replication samples from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program.
Results
For gene-centric analysis, we identified 41 significant associations with lipids traits in discovery phase. After conditioning on known common and low-frequent lipids-associated variants, 4 out of the 41 associations remained significant and could be validated in replication phase, including the association between enhancer RVs of APOA1 and HDL-C, promoter or enhancer RVs of APOE and TG, promoter RVs of APOE and TC. For genetic region analysis, we identified 8 significant associations that remained significant in conditional analysis using both discovery and replication samples, including RVs in intergenic region near ZPR1 and intronic region near SIK3 with HDL-C, intergenic region near PCSK9 and APOC1P1 with LDL-C, Intronic region in PAFAH1B2, SIDT2 and CEP164 with TG, and intergenic region near PCSK9 with TC.
Summary
Several novel non-coding RV-sets associated with lipids were discovered and replicated using the TOPMed WGS Freeze 5 data.

Key Words:
Genome sequencing (87); Genome-wide association (88); Rare variants (167); Statistical Genetics (180)
Back to top