Skip to main content

Lipids

Deciphering rare non-coding LDL-C associations in over 246K individuals with whole genome sequencing

Authors
M. Selvaraj, X. Li, Z. Li, X. Lin, G. Peloso, P. Natarajan, TOPMed Lipids Working Group
Name and Date of Professional Meeting
ASHG 2024
Associated paper proposal(s)
Working Group(s)
Abstract Text
Background:
Blood lipids and specifically low-density lipoprotein cholesterol (LDL-C) is a heritable risk factor for cardiovascular diseases, a leading cause of death. Recent genome-wide association studies (GWAS) identified numerous loci related to blood lipid levels, but the role of rare non-coding variants is less well-understood. Whole-genome sequencing (WGS) allows exploration of these variants. Our study meta-analyzed WGS data from two large datasets (TOPMed, n=72,175 and UK Biobank, n=173,982), yielding the largest WGS analysis for LDL-C.
Methods:
We ascertained deep-coverage WGS and LDL-C from UK Biobank and NHLBI freeze 10 (n=23 cohorts). We harmonized and normalized lipid measures from individual cohort and adjusted for age, sex, cohort-race, PCs and accounted for lipid-lowering medicine status. To enable efficient WGS meta-analysis across UK Biobank and TOPMed freeze 10, we implemented the MetaSTAAR workflow. In addition to single variant analyses, we performed gene-centric coding and non-coding set-based, and region-based sliding window meta-analysis of rare variants (MAF <1%) for LDL-C. Finally, we replicated our findings in All of Us WGS data.
Results:
We generated variant summary statistics and covariances matrices for UK Biobank and TOPMed, independently. We processed 571M and 660M variants from TOPMed and UKB respectively, in which 92M variants had a minor allele count >20. We then conducted the meta-analysis of both studies following the MetaSTAAR workflow. We used 5gene-centric coding variant masks and 7 non-coding variant masks and filtered genome significant aggregates based on Bonferroni-correction(0.05/(20K*masks)). Before conditional analysis we obtained 70 and 111 aggregates significantly associated with LDL-C for coding and non-coding region, respectively. After adjusting for known common variants we obtained 39 and 44 aggregates and replicated 25 and 28 coding and non-coding aggregates respectively. Many important known Mendelian lipid genes including LDLR, APOB, PCSK9 were significant and novel rare variant aggregates in ABCA6
and RELB were also significantly associated with LDL-C.
Conclusion:
In summary, we extend prior observations of rare non-coding variants near Mendelian lipid genes to now novel genes without prior known common non-coding or rare variant coding evidence.

Validation of Multi-ancestry Polygenic Scores for Lipid Levels in 3,119 Participants from Samoa and American Samoa

Authors
Toni-Ann J. Yapp
Mohanraj Krishnan
Shuwei Liu
Samantha L. Manna
Hong Cheng
Take Naseri
Muagututi‘a Sefuiva Reupena
Satupa‘itea Viali
John Tuitele
Ranjan Deka
Nicola L. Hawley
Stephen T. McGarvey
Daniel E. Weeks
Ryan L. Minster
Jenna C. Carlson
Name and Date of Professional Meeting
ASHG 2024 (November 5-9, 2024)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Introduction: Dyslipidemia is a significant risk factor for cardiovascular disease and a major contributor to the global burden of disease. Cardiovascular disease is the leading cause of death in Samoa, accounting for 34% of deaths. Exploring the genetic determinants of blood lipid traits in Polynesian individuals could help address research inequities and has the potential to provide additional insight about the biological foundations of such traits in other populations. A 2021 study by Graham et al. conducted a multi-ancestry genome-wide genetic discovery meta-analysis of lipid levels and generated polygenic scores (PGS) from ~1.65 million individuals from East Asian, admixed African American, Hispanic, and South Asian populations. In the current study, we applied these PGS for four traits—LDL-C, HDL-C, triglycerides (TG), and total cholesterol (TC)—in a sample of 3,119 participants across three time-separated cohorts recruited from Samoa and American Samoa in 1990-1991, 2002-2003, and 2010.
Methods: We calculated PGS in the Samoan cohorts using the variants and weights derived from the 2021 study. First, the PGS variants were lifted over to hg38 and then harmonized with genome-wide imputed variants in the Samoan cohorts. We calculated individual-level PGS by summing the products of genotype dosage and variant weights for each PRS variant. We assessed performance of the PGSs in each cohort with partial r 2 and bootstrapped confidence intervals from linear regression models for each trait adjusting for age and sex.
Results: The PGS for LDL-C had r 2 = ~8% across the 3 Samoan cohorts, which was marginally lower in performance to East Asian and South Asian populations (r 2 = ~8-10%) and much lower than African American and Hispanic populations (r 2 = ~10-16%) in the 2021 study. The PGS for HDL-C had r 2 = ~10% for the discovery and 2002 cohort but had r 2 = ~5% in the 1990 cohort. TC had a PGS r 2 = ~10% across the three cohorts. The PGS for TG had r 2 = ~5-7% across the three cohorts.
Discussion: Our findings suggest that PGS derived from multi-ethnic ancestry populations have reduced predictive power when applied to the Samoan population. Additionally, the differences in r 2 values between traits could provide evidence that certain traits are more influenced by the environment than others. This highlights the need to build PGS in a variety of environmental contexts and ancestries to improve the accuracy and transferability of genetic risk prediction across diverse populations. Further research is needed to refine and validate PGS in the Samoan population and to assess their potential clinical utility in risk stratification and targeted interventions for blood lipid traits.

StocSum: a reference-panel-free summary statistics framework for diverse populations

Authors
Han Chen, Nannan Wang, Bing Yu, Goo Jun, Qibin Qi, Ramon A. Durazo-Arvizu, Sara Lindstrom, Alanna C. Morrison, Robert C. Kaplan, Eric Boerwinkle
Name and Date of Professional Meeting
IGES 33rd Annual Meeting (November 3-4, 2024)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Genomic summary statistics have been widely used to address various scientific questions in genetic and genomic research. Applications that involve multiple genetic variants, such as conditional analysis, variant set and gene-based tests, heritability and genetic correlation estimation, also require correlation or linkage disequilibrium (LD) information between genetic variants, often obtained from an external reference panel. While these methods usually have good performance for common variants in populations of only European ancestry, in practice, it is usually difficult to find external reference panels that accurately represent the LD structure for isolated, underrepresented or admixed populations, as well as rare genetic variants from whole genome sequencing (WGS) studies, limiting their applications to European populations. To maximize the applicability of summary statistics-based methods and make them equally beneficial to all human populations, we have developed StocSum, a novel reference-panel-free statistical framework for generating, managing, and analyzing stochastic summary statistics using random matrix algorithms. Using two cohorts from the Trans-Omics for Precision Medicine Program, we demonstrate the accuracy of StocSum-based LD measures as compared to those directly computed from individual-level genotype data, in European-, African-, and Hispanic/Latino-Americans. We also show that for admixed populations such as African- and Hispanic/Latino-Americans, LD measures computed from external reference panels perform much worse, even if all ancestry populations are included in those reference panels. As a reference-panel-free framework, StocSum will facilitate sharing and utilization of genomic summary statistics from WGS studies, especially for isolated, underrepresented and admixed populations.

Genetic and phenotypic association analyses of cardiometabolic traits in diverse African samples with whole-genome sequencing data

Authors
Daniel Hui*, Matt Hansen*, Daniel Harris, Michael McQuillan, Dan Ju, Alexander Platt, William Beggs, Sunungouko Wata Mpoloka, Gaonyadiwe George Mokone, Gurja Belay, Thomas Nyambo, Stephen Chanock, Meredith Yeager, TOPMed Consortium, Giorgio Sirugo, Marylyn D. Ritchie, Scott Williams, Sarah A. Tishkoff
Name and Date of Professional Meeting
American Society of Human Genetics, November 2023
Associated paper proposal(s)
Working Group(s)
Abstract Text
African populations demonstrate exceptional genetic and phenotypic diversity, due in part to their varied environments, lifestyles, and demographic history. We conducted genetic and phenotypic association analyses in 6,965 geographically and ethnically diverse Sub-Saharan African individuals (6,280 with whole-genome sequences from the NIH TOPMed consortium and 685 with genotypes from Illumina arrays), using 15 cardiometabolic phenotypes (range 686-6,854 individuals/trait). Each phenotype had at least one ethnicity with significantly differing mean values compared to the remaining cohort, such as short stature in the Baka rainforest hunter-gatherers of Cameroon, and high adiposity in the Herero pastoralists of Botswana. An analysis of ethnicity-sex interactions revealed several ethnic groups with significant sexual dimorphism for at least one cardiometabolic phenotype, such as Herero women having markedly higher body mass index than men. Comparison between the African cohort and African ancestry UK Biobank (UKBB) individuals showed the latter have higher mean values than any of the 53 African ethnic groups for multiple cardiometabolic measurements, including low density lipoprotein cholesterol (LDL), body fat percentage (BFP), and systolic blood pressure. We also found that phenotype-phenotype correlations differ between the UKBB and African cohort, as well as between African ethnicities. For example, BFP and LDL had low correlation in the UKBB (R=0.04) but showed a range of correlation among African groups, from R = 0.00 in the Maasai pastoralists of eastern Africa to R = 0.43 in the Agaw agriculturalists of Ethiopia. Genome-wide association analyses identified 76 significantly associated loci (p<5.0x10-8), with 14 passing a more stringent empirical threshold (p<3.0x10-9), including APOE and APOC1 loci for various blood lipids, PCSK9 for LDL, and CETP for high density lipoprotein cholesterol (HDL), as well as novel loci. Set-based rare variant analyses for loss-of-function variants found 12 gene-phenotype associations replicating known associations with PCSK9 and APOE for LDL and total cholesterol and uncovering several novel gene-trait associations for adiposity traits and HDL. Ongoing analyses include phenotype associations with subsistence and genetically inferred ancestry, replication of genetic associations, and gene-set enrichment. In total, these results offer insights into the genetic and phenotypic landscape of cardiometabolic traits in African populations. This work was supported by grant numbers: ADA 1–19-VSN-02, NIH grants 1R35GM134957, R01DK104339, and R01AR076241, and 1X01HL139409-01.

MultiSTAAR: A statistical framework for powerful rare variant multi-trait analysis in biobank-scale sequencing studies

Authors
Xihao Li, Han Chen, Margaret Sunitha Selvaraj, Eric Van Buren, Kenneth M. Rice, Jerome I. Rotter, Gina M. Peloso, Pradeep Natarajan, Zilin Li, Zhonghua Liu and Xihong Lin, on behalf of the TOPMed Lipids Working Group
Name and Date of Professional Meeting
American Society of Human Genetics Annual Meeting (November 1-5, 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Introduction
Biobank-scale sequencing studies have made it feasible for better understanding rare variant contributions to complex human traits and diseases. Leveraging association strengths across multiple traits in rare variant association analysis of sequencing studies can improve statistical power over single-trait analysis and detect pleiotropic genes or noncoding regions. Existing methods have limited ability to perform rare variant multi-trait analysis when applied to biobank-scale sequencing data.

Methods
We propose MultiSTAAR, a powerful statistical framework and computationally scalable analytical pipeline for functionally-informed rare variant multi-trait analysis in biobank-scale sequencing studies. As a statistical framework, MultiSTAAR accounts for relatedness, population structure and correlation between phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. As a comprehensive and robust analytical pipeline, MultiSTAAR facilitates functionally-informed multi-trait analysis of both coding and noncoding rare variants by incorporating multiple variant functional annotations for grouping and weighting. MultiSTAAR also provides conditional multi-trait analysis to dissect rare variant association signals independent of known variants.

Results
We applied MultiSTAAR to perform whole-genome sequencing rare variant analysis of 61,838 ancestrally diverse participants from 20 studies by jointly analyzing three quantitative lipid traits from the NHLBI TOPMed consortium: LDL-C, HDL-C and TG. In gene-centric multi-trait analysis of rare variants, MultiSTAAR identified 43 conditionally significant associations with lipid traits, including 4 noncoding associations (enhancer DHS rare variants in NIPSNAP3A and LIPC; ncRNA rare variants in RP11-310H4.2 and MIR4497) that were missed by any of the three single-trait functionally-informed analysis using STAARpipeline. In genetic region multi-trait analysis of rare variants, MultiSTAAR identified 7 conditionally significant 2-kb sliding windows associated with lipid traits, including two sliding windows in DOCK7 (chromosome 1: 62,651,447 - 62,653,446 bp; chromosome 1: 62,652,447 - 62,654,446 bp) and an intergenic sliding window (chromosome 1: 145,530,447 - 145,532,446 bp) that were missed by single-trait analysis using STAARpipeline.

Summary
In summary, MultiSTAAR provides a powerful statistical framework and a computationally scalable analytical pipeline for multi-trait analysis of biobank-scale sequencing studies with complex study samples.

cellSTAAR: Incorporating single-cell-sequencing-based functional data to boost power in rare variant association testing of non-coding regions

Authors
Eric Van Buren, Yi Zhang, Xihao Li, Zilin Li, Hufeng Zhou, Gina M. Peloso, Jerome I. Rotter, Pradeep Natarajan, Xihong Lin, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working Group
Name and Date of Professional Meeting
ASHG 2023
Associated paper proposal(s)
Working Group(s)
Abstract Text
Introduction

Whole genome sequencing (WGS) studies have cumulatively identified hundreds of millions of rare variants, the majority of which are in non-coding regions and of unknown function. Given this large number of genetic variants, existing methods for gene-centric Rare Variant Association Tests (RVATs) in WGS studies have identified relatively few associations between candidate Cis-Regulatory Elements (cCREs) and complex human diseases. Because the regulatory landscape of many cCREs varies across cell types, it is of substantial interest to incorporate single-cell sequencing data into RVATs to capture the functional variability that exists across cell types in the non-coding genome and boost statistical power in the process.

Methods

We propose cellSTAAR to address two opportunities to improve existing gene-centric RVAT methods as applied to genetic variants in cCREs. First, cellSTAAR integrates single-cell ATAC-seq data to capture variability in chromatin accessibility across cell types via the construction of cell-type-specific variant sets and the upweighting of relevant variants using cell-type-specific functional annotations. Second, cellSTAAR links cCREs to their target genes using an omnibus framework that aggregates results from a variety of linking approaches, each of which uses differing kinds of genomic data and computational approaches, to reflect the uncertainty in element-gene linking. We applied cellSTAAR on Freeze 8 (N = 60,000) of the NHLBI Trans-Omics for Precision Medicine (TOPMed) consortium data to three quantitative lipids traits: LDL, HDL, and TG.

Results

In at least one cell type, genome-wide significant promoter and enhancer associations were found in several known lipids loci, including APOE, APOA1, and CETP. Critically, unlike existing methods, cellSTAAR reveals variability in the significance of these loci across a variety of cell types and uncertainty in the target gene for significant enhancers. For example, out of 19 cell types analyzed, the significant enhancer near APOE was found in only 6 cell types. Included in these 6 are 5 cell types known a priori to be highly relevant to lipids: hepatocytes, fetal hepatoblasts, adipocytes, liver endothelial cells, and enterocytes from the small intestine. Although the associated enhancer is contained with the APOE gene, 3D-based evidence from SCREEN suggests possible regulation of nearby genes APOC2 and APOC4. This uncertainty in target gene is not reflected in existing RVAT methods. Using a weakened genome-wide significance threshold, the most discoveries using cellSTAAR are found in cell types that are the most relevant to lipids such as those mentioned above.

Conclusions

We propose a new statistical method, cellSTAAR, to integrate single-cell sequencing data into gene-centric RVATs of candidate enhancer and promoter regions. When applied to three quantitative lipids traits from the TOPMed consortium, cellSTAAR produces replicated discoveries in known genes, reveals variability in significance across cell types, and allows us to investigate the impact of the uncertain links between regulatory elements and their target genes.
Back to top