Lipids | NHLBI Trans-Omics for Precision Medicine

MultiSTAAR: Powerful rare variant multi-trait analysis incorporating functional annotations for large-scale whole genome sequencing studies, with application to TOPMed lipid data

Submitted by	Li, Xihao
Authors	Zhonghua Liu, Xihao Li, Zilin Li, Han Chen, Hufeng Zhou, Sheila M. Gaynor, Jerome I. Rotter, Cristen J. Willer, Gina M. Peloso, Pradeep Natarajan and Xihong Lin, on behalf of the TOPMed Lipids Working Group
Name and Date of Professional Meeting	ASHG 2020 Annual Meeting (October 27-31, 2020)
Associated paper proposal(s)	Powerful rare variant multi-trait analysis for large-scale whole genome sequencing studies using MultiSTAAR, with application to TOPMed lipid data
Working Group(s)	Lipids
Abstract Text	Introduction Integrating association evidence across multiple traits can improve power of rare variant (RV) association analysis in whole genome sequencing (WGS) studies. Commonly used RV multi-trait analysis approaches have several limitations when applied to WGS data, including computation scalability, lack in controlling for population structure and relatedness and loss of power without incorporating functional annotations. Methods We propose MultiSTAAR (variant-Set Test for Association using Annotation information on Multiple continuous phenotypes), a powerful and scalable rare variant multi-trait analysis method MultiSTAAR for large-scale sequencing association studies. MultiSTAAR is efficient and scalable for jointly analyzing multiple traits in large-scale WGS studies by using sparse Genetic Relatedness Matrices, and accounts for both relatedness and population structure using a multivariate linear mixed model framework. MultiSTAAR also empowers rare variant association analysis by incorporating multiple functional annotations though STAAR framework. We provide two general strategies of WGS RV association analysis using MultiSTAAR: gene-centric analysis by grouping variants in different functional categories of a gene defined by variant effect predictors categories and sliding windows analysis. Results We applied MultiSTAAR to identify RV-sets jointly associated with three quantitative lipid traits LDL-C, HDL-C and TG in 12,316 discovery samples and 17,822 replication samples from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, which had 244 million rare variants in total. In RV gene-centric analysis, MultiSTAAR identified 7 significant associations with lipids traits. After conditioning on known lipids-associated variants, 4 out of the 7 associations remained significant and could be validated in replication phase, including the association of ABCA1 missense RVs, which was not detected by analyzing each of the single trait using STAAR. In RV sliding window analysis, MultiSTAAR detected 29 significant 2kb sliding windows associated with lipid traits. 2 sliding windows located in gene APOC3 remained significant after conditioning on known variants and could be validated in replication phase. Summary By jointly analyzing multiple correlated phenotypes and incorporating multiple functional annotations, MultiSTAAR empowers rare variant association analysis and detected novel rare variants association with lipid traits using the TOPMed WGS data.

Whole genome sequence analysis of plasma lipids in a multi-ethnic cohort of 66,329 individuals

Submitted by	Selvaraj, Margaret Sunitha
Authors	Margare Sunitha Selvaraj, Akhil Pampana, Gina Peloso, Pradeep Natarajan, on behalf of the TOPMed Lipids Working Group
Name and Date of Professional Meeting	ASHG (Oct,2020)
Associated paper proposal(s)	Whole genome sequence analysis of plasma lipids in TOPMed Freeze 9
Working Group(s)	Lipids
Abstract Text	BACKGROUND: Plasma lipid levels, which include low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides (TG), and total cholesterol (TC), are heritable risk factors and therapeutic targets for coronary heart disease. More than 350 loci have been associated with plasma lipid levels in genome-wide association studies. We now extend prior efforts to examine the full allelic spectrum with plasma lipids using whole genome sequencing. METHODOLOGY: Whole genome sequenced (WGS) data (>30X coverage) and plasma lipids from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program was used for the analysis. Single variants with MAF > 0.1% from 22 autosomes were associated with the four lipid traits individually, while adjusted for covariates (PCs, age, sex, age^2, cohort-race and kinship). Single variant GWAS was carried out using a fast linear mixed model with kinship adjustment (SAIGE-QT). Significant common variants were identified with p- value <5e-9 and compared with previously reported GWAS summary data. RESULTS: We analyzed WGS data and plasma lipids from 66,329 samples in 21 studies (Amish, ARIC, BioMe, CARDIA, CFS, CHS, DHS, FHS, GeneSTAR, GENOA, GenSalt, GOLDN, HCHS_SOL, HyperGEN, JHS, MESA, MGH_AF, SAFS, SAS, THRV and WHI). The dataset was multi-ethnic and included samples from Asian (4,719-7%), Black (16,983-26%), Hispanic (13,943-21%), Samoan (1,182-2%) and White (29,502-44%) ethnicities. Novel low frequency variants included, intronic variant 15q22.2 (MAF-0.1%) to gene RNF111 associated with HDL-C, intergenic variants 12q23.1 (MAF-0.3%), 4q34.2 (MAF-0.1%) associated with LDL-C, and intergenic variant 11q13.3 (MAF-0.01%), intronic variant 15q21.1 (MAF-0.3%) to gene SPG11 associated with TG. The low frequency variants were identified to be specific to certain ethnic groups, for example 15q22.2 variant specific to White population with MAF-0.1%, 12q23.1 to Hispanic population with MAF-1% and 4q34.2 to African ethnic group with MAF-0.6%. Additionally, we found common lead variants including 11p15.4 (MAF-6%) with LDL-C, and 11q12.2 (MAF-49%), 13q34 (MAF-28%), and 20q13.12 (MAF-2%) with TC; in prior analyses of the Million Veteran Program, these variants showed consistent associations. CONCLUSIONS: Whole genome sequence analysis of plasma lipids in diverse ethnicities provides a platform to identify novel genetic associations.

Powerful and resource-efficient rare variant meta-analysis for large-scale whole genome sequencing studies using summary statistics and functional annotations, with application to TOPMed lipid data

Submitted by	Li, Xihao
Authors	Xihao Li, Zilin Li, Corbin Quick, Hufeng Zhou, Sheila M. Gaynor, Han Chen, Jerome I. Rotter, Cristen J. Willer, Pradeep Natarajan, Gina M. Peloso, and Xihong Lin, on behalf of the TOPMed Lipids Working Group, BioData Catalyst Consortium
Name and Date of Professional Meeting	ASHG 2020 Annual Meeting (October 27-31, 2020)
Associated paper proposal(s)	Powerful and resource-efficient rare variant meta-analysis for large-scale whole genome sequencing studies using summary statistics and functional annotations, with application to TOPMed lipid data
Working Group(s)	Lipids
Abstract Text	Introduction Large-scale whole genome sequencing (WGS) studies have enabled the analysis of rare variants (RVs) associated with complex human traits. Existing RV meta-analysis approaches are not scalable when applied to WGS data. Methods We propose MetaSTAAR (Meta-analysis of variant-Set Test for Association using Annotation infoRmation), a powerful and resource-efficient rare variant meta-analysis framework, for large-scale whole genome sequencing association studies. MetaSTAAR accounts for population structure and relatedness for both continuous and dichotomous traits by fitting the generalized linear mixed models using sparse genetic relatedness matrices. By storing LD information of RVs in sparse matrix format, the proposed workflow is highly storage efficient and computationally scalable for analyzing large-scale WGS data. Furthermore, the proposed meta-analysis framework builds upon the STAAR method, which dynamically incorporates multiple functional annotations to empower rare variant association analysis and allows for RV-set analysis including gene-centric analysis by grouping variants into functional categories for each gene and genetic region analysis using sliding windows. MetaSTAAR also enables conditional analyses to identify RV-set signals independent of nearby common variants. Results We applied MetaSTAAR to identify RV-sets associated with four quantitative lipid traits (LDL-C, HDL-C, TG and TC) in 30,138 related samples from the NHLBI Trans-Omics for Precision Medicine program Freeze 5 data, consisting of 14 ancestrally diverse study cohorts and 255 million variants in total. MetaSTAAR requires 520 GB to store the summary statistics and LD matrices across the whole genome, which is at least 100 times smaller than the existing method RAREMETAL. In addition, the computation time is benchmarked to be at least 10 times faster than RAREMETAL. In RV gene-centric analysis, MetaSTAAR identified 70 significant associations with lipids traits. In RV sliding window analysis, MetaSTAAR detected 257 significant 2kb sliding windows associated with lipid traits. Compared to the joint analysis of pooled individual-level data using STAAR, the P-values from MetaSTAAR and STAAR are highly concordant, with correlation > 0.99 among significant regions. Conclusion We propose MetaSTAAR as a power and resource-efficient framework for meta-analysis of rare variant association, while incorporating multiple variant functional annotations to further improve power. Currently, the proposed framework is the only available solution to perform rare variant meta-analysis at the scale of large whole genome sequencing studies. Key Words: Genome sequencing; Genome-wide association; Methodology; Rare variants; Statistical genetics

Phenome-wide and molecular consequences of inbreeding

Submitted by	Manichaikul, Ani
Authors	Maria Murach, Center for Public Health Genomics, University of Virginia, USA Zhennan Zhu, , Center for Public Health Genomics, University of Virginia, USA Mark O. Goodarzi, Division of Endocrinology, Diabetes, and Metabolism, Cedars-Sinai Medical Center, Los Angeles, CA, USA Gina M. Peloso, Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA Leslie A. Lange, Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado School of Medicine Anschutz Medical Campus, Aurora, Colorado, USA Jingzhong Ding, Gerontology and Geriatric Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA Francois Aguet, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA Kristin G. Ardlie, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA Robert E. Gerszten, Cardiovascular Research Center and Cardiology Division, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA The MESATOPMed Multi-omics Team Jerome I Rotter, The Institute for Translational Genomics and Population Sciences, The Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA Stephen S. Rich, Center for Public Health Genomics, University of Virginia, USA Ani Manichaikul, Center for Public Health Genomics, University of Virginia, USA Wei-Min Chen, Center for Public Health Genomics, University of Virginia, USA
Name and Date of Professional Meeting	American Society of Human Genetics (Oct 27-31, 2020)
Associated paper proposal(s)	Phenome-wide and molecular consequences of inbreeding
Working Group(s)	Anthropometry - Adiposity (includes Physical Activity) Diabetes Lipids Multi-Omics PFT Lung Population Cohorts Population Genetics
Abstract Text	BACKGROUND: Runs of homozygosity (ROH) are long stretches of homozygous genotypes in a person’s genome that result from the individual inheriting identical haplotypes from each of their two parents, often reflecting some degree of inbreeding. Long ROHs can be accurately inferred from genome-wide SNP data and can be used to indicate the level of inbreeding. We perform phenome-wide and comprehensive analysis of multi-omics data to identify the correlates of inbreeding. METHODS: We used our tool KING to estimate inbreeding coefficients (F_ROH) in UK Biobank and MESA, using long ROHs (>3Mb) which may be indicative of identical by descent (IBD) rather than identical by state (IBS) sharing. We used data from the UK Biobank to perform association studies between the inbreeding coefficient and various cardiometabolic, pulmonary, body size, cognition, and socioeconomic traits using linear and logistic regression (1) across all participants (433,768) and (2) stratified by sex. Regression models were adjusted for sex, age, study site and principal components of ancestry. Data from the Multi-Ethnic Study of Atherosclerosis (MESA), which includes samples from people of diverse ancestries, was used to investigate the proteomic and transcriptomic consequences of inbreeding. Genes/proteins correlated with a higher inbreeding coefficient were found using a linear regression model and then they were used to identify significant pathways that may help to understand the effects of ROH and underlying mechanisms in the human genome. RESULTS: In the UK Biobank, considering a Bonferroni P value cutoff 0.00025, inbreeding showed significant associations in 23 out of 200 traits (11.5%), including pulmonary traits (e.g., forced vital capacity FVC), body size traits (e.g., height), socioeconomic traits (e.g., Townsend deprivation index), and cognitive traits (e.g., fluid intelligence score). We identified sex differences of the inbreeding effect on the health outcomes, with generally larger effects on women than men. For example, the effect of inbreeding on the risk of diabetes was much stronger in women (OR1st-degree=6.0, P = 0.0002) than in men (P=0.54). In MESA, both the proteomics and transcriptomic analyses identified genes enriched for immune-related pathways, for example, stimulatory C-type lectin receptor signaling pathway (P < 0.05), which may suggest that inbreeding has an impact on human immunology. CONCLUSION: Inbreeding affects significantly a large proportion of health outcome and molecular traits, and the effects may differ by sex. Future work could perform homozygosity mapping to detect genomic regions responsible for those outcomes in human phenotypes.