TMF: Powerful and resource-efficient multi-trait analysis for large-scale multi-ethnic whole-genome sequencing studies
Principal Investigator: Xihao Li, PhD
Project Title: Powerful and resource-efficient multi-trait analysis for large-scale multi-ethnic whole-genome sequencing studies
Abstract: This proposal aims to develop powerful and resource-efficient statistical methods and pipeline for integrative analysis of multiple traits in large-scale multi-ethnic whole-genome sequencing studies, with applications to TOPMed lipids and metabolomics data. First, I propose to develop MultiSTAAR as a powerful and computationally scalable tool for jointly analyzing rare variant (RV) associations of multiple traits. Using a multivariate linear mixed model framework, MultiSTAAR leverages the correlation among multiple traits and improves analysis power compared to single trait analysis. MultiSTAAR accounts for relatedness and population structure, and further empowers RV multi-trait analysis by dynamically incorporating multiple functional annotations and multi-ethnic information. Second, I propose to develop STAARpipelinePheWAS as a powerful and resource-efficient tool for efficiently analyzing multiple traits in parallel under the context of phenome-wide association studies (PheWAS). The proposed STAARpipelinePheWAS only requires extracting the genotype and functional annotation data once when analyzing multiple traits, which reduces the computation cost while maintaining exactly the same results compared to analyzing RV associations of each phenotype one at a time. Third, I will develop openaccess statistical software capable of implementing our proposed methods in both offline and cloud computing BioData Catalyst ecosystem. I will apply the proposed methods to the analysis of the TOPMed data, including using MultiSTAAR to jointly analyze multiple TOPMed Freeze 8 lipids phenotypes and using STAARpipelinePheWAS to analyze up to 1,666 TOPMed Freeze 8 metabolomics phenotypes.