Skip to main content

Lipids

StocSum: stochastic summary statistics for whole genome sequencing studies

Authors
Nannan Wang, Bing Yu, Goo Jun, Qibin Qi, Ramon A. Durazo-Arvizu, Sara Lindstrom, Alanna C. Morrison, Robert C. Kaplan, Eric Boerwinkle, Han Chen
Name and Date of Professional Meeting
ASHG Meeting (November 1-5, 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Genomic summary statistics, usually defined as single-variant test results from genome-wide association studies, have been widely used to address different scientific questions in genetic and genomic research, such as meta-analysis, heritability estimation, conditional analysis, variant set and gene-based tests, multiple phenotype analysis, genetic correlation or co-heritability estimation. Applications that involve multiple genetic variants also require their correlations or linkage disequilibrium (LD) information, often obtained from an external reference panel. While these methods usually have good performance for common variants in populations of European ancestry, in practice, it is usually difficult to find suitable external reference panels that represent the LD structure for isolated, underrepresented and admixed populations, or rare genetic variants from whole genome sequencing (WGS) studies, limiting the scope of applications for genomic summary statistics. We have developed StocSum, a novel reference-panel-free statistical framework for generating, managing, and analyzing stochastic summary statistics using random vector algorithms. Regardless of the complex sample correlation structure, StocSum always scales linearly with both the sample size and the number of genetic variants in computing stochastic summary statistics from individual-level data. We develop various downstream applications using StocSum including single-variant tests, conditional association tests, gene-environment interaction tests, variant set tests, as well as meta-analysis and LD score regression tools. The complexity of all these downstream applications does not depend on the sample size. We demonstrate the accuracy and computational efficiency of StocSum using two cohorts from the Trans-Omics for Precision Medicine Program. Specifically, we show that StocSum can be used to perform long-range variant set tests, expanding the aggregation units beyond genes or genomic regions in close proximity. We also show that for admixed populations, LD scores estimated by StocSum are much more accurate compared to those from external reference panels, even if all ancestry populations are included in those reference panels. In summary, as a reference-panel-free framework, StocSum will facilitate sharing and utilization of genomic summary statistics from WGS studies, especially for isolated, underrepresented and admixed populations.

StocSum: stochastic summary statistics for whole genome sequencing studies

Authors
Nannan Wang, Bing Yu, Goo Jun, Qibin Qi, Ramon A. Durazo-Arvizu, Sara Lindstrom, Alanna C. Morrison, Robert C. Kaplan, Eric Boerwinkle, Han Chen
Name and Date of Professional Meeting
Joint Statistical Meetings (August 5-10, 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Genomic summary statistics, usually defined as single-variant test results from genome-wide association studies, have been widely used to advance the genetics field in a wide range of applications. Applications that involve multiple genetic variants also require their correlations or linkage disequilibrium (LD) information, often obtained from an external reference panel. In practice, it is usually difficult to find suitable external reference panels that represent the LD structure for underrepresented and admixed populations, or rare variants from whole genome sequencing (WGS) studies, limiting the scope of applications for genomic summary statistics. We develop StocSum, a novel reference-panel-free statistical framework for generating, managing, and analyzing stochastic summary statistics using random vector algorithms. Regardless of the complex sample correlation structure, StocSum always scales linearly with both the sample size and the number of genetic variants in computing stochastic summary statistics from individual-level data. We demonstrate the accuracy and computational efficiency of StocSum using two cohorts from the Trans-Omics for Precision Medicine WGS studies.

Whole genome sequence analysis of long non-coding RNAs for plasma lipid traits

Authors
Yuxuan Wang, Margaret Sunitha Selvaraj, Pradeep Natarajan, Gina M Pelos, on behalf of the TOPMed Lipids Working Group
Name and Date of Professional Meeting
CHARGE Seattle Conference (Oct 12-14, 2022)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Background.
Elevated blood lipids are heritable risk factors and major modifiable cause of cardiovascular disease. While long non-coding RNAs (lncRNAs) have important regulatory functions for lipid metabolism in model systems, the relationship between genetic variation in lncRNAs and blood lipid levels in humans is not well understood. We now utilize large-scale whole genome sequencing (WGS) studies and new statistical methods for variant set tests to assess the association between lncRNAs across the genome and plasma lipid traits.

Methods.
We analyzed 66,329 individuals with TOPMed freeze8 WGS data and lipid levels (LDL-C, HDL-C, TC and TG). We defined lncRNA testing units by integrating annotations from four different genome annotation projects: GENCODE (v38), FANTOM CAT(robust), NONCODE (v6), and lncRNAKB (v7). We aggregated rare (MAF < 1%) variants for each lncRNA based on the lncRNA genomic locations and conducted the rare variants aggregate test using the STAAR framework incorporating multiple functional annotations. We further performed conditional analyses adjusting for previously reported common variants that associated with lipids. Since there are overlapping regions between the lncRNAs, we estimated the effective number of aggregate-based tests (Meff) for multiple testing correction.

Results.
In total, we conducted RV aggregate tests in 166k lncRNA regions with 113,587 effective number of aggregate-based tests. We identified 40, 31, 30, and 30 genome-wide significant (p < 0.05/111550 =4.5e-07) lncRNAs with LDL, HDL, TC and TG, respectively, in 16 loci. After conditioning on known lipid-associated variants, 21, 15, 16, and 11 associations remained significant. Of the significant lncRNAs in the conditional analysis, 16, 11, 14, and 10 associations were near at least a known lipid mendelian gene, including ENSG00000233271.1 near PCSK9 associated with LDL-C, NONHSAG026009.2 near APOE associated with TC, NONHSAG108446.1 near CETP associated with HDL-C, and NONHSAG009700.3 near APOA5 associated with TG. The remaining associations were all in lipid GWAS regions, except ENSG00000260441.5, which is an antisense to PLA2G15 that is associated with HDL-C.

Conclusions.
We discovered several associations between lncRNAs and plasma lipid traits, which provide insights into potential lipid regulatory mechanisms of GWAS loci. We will further seek replications in UK Biobank WGS and investigate the effects of lncRNAs on gene expression.

Rare protein-truncating DNA variants in APOB or PCSK9, low-density lipoprotein cholesterol, and risk of coronary artery disease

Authors
Jacqueline S. Dron; Aniruddh P. Patel; Yiyi Zhang; Dimitri J. Maamari; Minxian Wang; Eric Boerwinkle; Alanna C. Morrison; Paul S. de Vries; Myriam Fornage; Lifang Hou; Donald M. Lloyd-Jones; Bruce M. Psaty; Russell P. Tracy; Joshua C. Bis; Ramachandran S. Vasan; Daniel Levy; Nancy Heard-Costa; Stephen S. Rich; Xiuqing Guo; Kent D. Taylor; Richard A. Gibbs; Jerome I. Rotter; Cristen J. Willer; Elizabeth C. Oelsner; Andrew E. Moran; Gina M. Peloso; Pradeep Natarajan; Amit V. Khera
Name and Date of Professional Meeting
ASHG Meeting (October 2022)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Introduction
Protein-truncating variants (PTVs) in either apolipoprotein B (APOB) or proprotein convertase subtilisin/kexin type 9 (PCSK9) are associated with significantly lower low-density lipoprotein (LDL) cholesterol concentrations. Using data from prospective cohort studies, we quantified the relationship between PTVs in APOB and PCSK9, LDL cholesterol concentrations and protection against coronary heart disease (CHD).
Methods
We considered participants in five prospective cohorts from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program and the UK Biobank. PTVs were defined as nonsense, frameshift, and splice-site variants disrupting APOB or PCSK9. The impact of PTVs on LDL cholesterol levels was assessed using linear regression, and the hazard ratio (HR) for CHD between PTV carriers and non-carriers was estimated using a Cox proportional hazard model. Models were adjusted for age, sex and the first five principal components of ancestry. Age-dependent probabilities for cumulative incidence of CHD in carriers and non-carriers were determined by summing all events by age at most recent follow-up. The cumulative incidence of CHD by age 75 years between carriers and non-carriers was assessed using a standardized Cox proportional hazard model, adjusted for sex and the first five principal components of ancestry.
Results
From the NHLBI cohorts (N=19,073; 44.4% male; mean [SD] age of 52 [17] years; 67.0% white, 23.9% Black), PTVs were identified in 139 (0.7%) participants and were associated with a 49 mg/dL (95% CI 43-56) reduction in LDL cholesterol. Over a median follow-up of 21.5 years, incident CHD was observed in 12 carriers (8.6%) versus 3,029 non-carriers (16.0%), corresponding to an HR of 0.51 (95% CI 0.28-0.89). From the UK Biobank (N=190,464; 45.0% male; mean [SD] age of 58 [8] years; 93.9% white, 1.6% Black), a PTV was identified in 662 (0.4%) participants and were associated with a 45 mg/dL (95%CI 42-47) reduction in LDL cholesterol. By age 75, estimated cumulative exposure to LDL cholesterol was 31.6% lower in carriers, and the estimated CHD risk was 3.7% (95% CI 2.0%-5.3%) in carriers compared to 7.0% (95% CI 6.9%-7.2%) in non-carriers, corresponding to an HR of 0.51 (95% CI 0.32-0.81).
Conclusion
Results of this large-scale genetic association study confirm and extend prior cross-sectional analyses in identifying that a PTV in either APOB or PCSK9—owing to significantly decreased exposure to LDL cholesterol—is associated with a 49% reduction in risk of CHD.

Whole genome sequence analysis of long non-coding RNAs for plasma lipid traits

Authors
Yuxuan Wang, Margaret Sunitha Selvaraj, Pradeep Natarajan, Gina M Peloso
Name and Date of Professional Meeting
ASHG (October 27, 2022)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Background. Elevated blood lipids are heritable risk factors and major modifiable cause of cardiovascular disease. While long non-coding RNAs
(lncRNAs) have important regulatory functions for lipid metabolism in model systems, the relationship between genetic variation in lncRNAs and
blood lipid levels in humans is not well understood. We now utilize large-scale whole genome sequencing (WGS) studies and new statistical
methods for variant set tests to assess the association between lncRNAs across the genome and plasma lipid traits. Methods. We analyzed
66,329 individuals with TOPMed freeze8 WGS data and lipid levels (LDL-C, HDL-C, TC and TG). We defined lncRNA testing units by integrating
annotations from four different genome annotation projects: GENCODE (v38), FANTOM CAT(robust), NONCODE (v6), and lncRNAKB (v7). We
aggregated rare (MAF < 1%) variants for each lncRNA based on the lncRNA genomic locations and conducted the rare variants aggregate test
using the STAAR framework incorporating multiple functional annotations. We further performed conditional analyses adjusting for previously
reported common variants that associated with lipids. Since there are overlapping regions between the lncRNAs, we estimated the effective
number of aggregate-based tests (Meff) for multiple testing correction. Results. In total, we conducted RV aggregate tests in 166k lncRNA
regions with 113,587 effective number of aggregate-based tests. We identified 40, 31, 30, and 30 genome-wide significant (p < 0.05/113587 =
4.4e-07) lncRNAs with LDL, HDL, TC and TG, respectively, in 16 loci. After conditioning on known lipid-associated variants, 21, 15, 16, and 11
associations remained significant. Of the significant lncRNAs in the conditional analysis, 16, 11, 14, and 10 associations were near at least a
known lipid mendelian gene, including ENSG00000233271.1 near PCSK9 associated with LDL-C, NONHSAG026009.2 near APOE associated with
TC, NONHSAG108446.1 near CETP associated with HDL-C, and NONHSAG009700.3 near APOA5 associated with TG. The remaining associations
were all in lipid GWAS regions, except ENSG00000260441.5, which is an antisense to PLA2G15 that is associated with HDL-C. Conclusions. We
discovered several associations between lncRNAs and plasma lipid traits, which provide insights into potential lipid regulatory mechanisms of
GWAS loci. We will further seek replications in UK Biobank WGS and investigate the effects of lncRNAs on gene expression.

Portability of a Multiethnic Polygenic Risk Score for Low-Density Lipoprotein Cholesterol in a Samoan Population

Authors
Jenna C. Carlson
Mohanraj Krishnan
Nicola L. Hawley
Hong Cheng
Take Naseri
Muagututi‘a Sefuiva Reupena
Satupa‘itea Viali
Ranjan Deka
Stephen T. McGarvey
Ryan L. Minster
Daniel E. Weeks
Name and Date of Professional Meeting
ASHG Annual Meeting 2022 (October 25-29, 2022)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Polygenic risk scores (PRS) are a promising tool for improving health outcomes through personalized medicine, however there is a well-documented need to improve diversity in the populations through which PRS are generated to not further exacerbate health disparities among underrepresented populations. Specifically for Low-Density Lipoprotein Cholesterol (LDL-C), a known risk factor for cardiovascular disease, recent efforts have been made to diversify PRS to represent non-European ancestral groups, including African, East Asian, South Asian, and Hispanic populations. However, there are several other minority populations for which the transferability of PRS has not yet been examined. Specifically, the transferability of these ‘multiethnic’ PRS has not been studied in Pacific Islanders, who have a disproportionate burden of cardiovascular disease and are underrepresented in health research.

Thus, we sought to assess the performance of a recent multiethnic PRS for LDL-C, constructed using over 1 million individuals of African, East Asian, European, Hispanic, and South Asian (Graham et. al 2021), in a cohort of n=2,816 Samoan adults.

The genetic variant information and corresponding weights for the multiethnic LDL-C PRS were downloaded from the Polygenic Score Catalog, and WGS data from 2,816 Samoans were aligned to the scoring file to assign individual-level risk scores. Of the 9,009 variants in the PRS, 8,653 (96%) were available in the Samoan samples, although 20% (1,747/8,653) of variants were monomorphic in the Samoan samples. The distribution of the PRS in Samoans ranged from 43.6 to 49.3 risk alleles, with a mean of 46.8. The utility of the PRS was evaluated using a multivariable linear regression model adjusted for age, sex, and principal components of ancestry. The PRS was associated with higher LDL-C (β = 13.7 mg/dL, 95% CI 12.2 – 15.3 mg/dL, p = 4.3e-66). The partial r2 for the PRS was 10.17% (95% bootstrap CI 8.23 – 12.42%), values that are similar to the published performance of the PRS in other non-European ancestral groups (r2 10-16%), despite there being no Pacific Islanders represented in the PRS construction. While there is still much work to do to improve the representation of minority populations in health research, these results show that, at least for LDL-C, a PRS derived from multiple diverse ancestries performs similarly in Samoans as to other minority populations. Further work is needed to characterize the performance of PRS for other traits in this population to see if they are equally transferable and to compare performance to a PRS derived in a Pacific Islander population. However, this work highlights the importance of including several diverse populations in the construction of PRS as a first step in improving the accuracy and transferability to minority populations.
Back to top