Anthropometry - Adiposity (includes Physical Activity) | NHLBI Trans-Omics for Precision Medicine

The extent to which augmenting extant reference panels with population-specific sequences improves imputation quality

Submitted by	Carlson, Jenna
Authors	Jenna C. Carlson Mohanraj Krishnan Shuwei Liu Kevin Anderson Jerry Z. Zhang Hong Cheng Take Naseri Muagututi‘a Sefuiva Reupena Satupa‘itea Viali Ranjan Deka Nicola L. Hawley Stephen T. McGarvey Daniel E. Weeks Ryan L. Minster
Name and Date of Professional Meeting	American Society of Human Genetics Annual Meeting (November 1-5, 2023)
Associated paper proposal(s)	Biased Genotype Imputation in Samoans
Working Group(s)	Anthropometry - Adiposity (includes Physical Activity) Population Genetics
Abstract Text	Genotype imputation is fundamental to association studies, and even gold standard panels like TOPMed have limitations to the populations and variants for which they yield good imputation. To quantify the impact that varying the number of population-specific sequences in the reference panel has on imputation quality, we constructed 6 in-house reference panels from 2,504 1000G samples plus varying numbers of Samoan samples (4, 24, 48, 96, 384, and 1,285) from whole-genome sequencing and compared them to the 1000G Phase III and TOPMed imputation panels. Each reference panel was used to impute genotype data for 1,897 Samoan participants who were not part of any reference panel. We examined average imputation quality (r2) and the number of well-imputed variants (r2 ≥ 0.8) on chromosomes 5 and 21 to assess performance and compared them to two gold-standard reference panels: TOPMed and 1000G Phase III. To further characterize variants that might gain the most in imputation accuracy, we also calculated LD scores split into low and high strata at the median value within MAF bins. The 1000G + 1285 Samoan panel yielded > 200,000 more high-quality variants on chromosome 5 than the TOPMed panel, with 48,374 of these having a MAF ≥ 0.01. The largest gains were seen for lower-frequency variants with an up to 125% increase in well-imputed variants with MAF < 0.01 compared to the TOPMed imputation. Imputation quality increased as the number of Samoans represented in the panel increased. Panels with 48 or more Samoans included outperformed the TOPMed panel for all variants with MAF ≥ 0.001. The gains in imputation quality for the 1000G + 1285 Samoan reference panel compared to the TOPMed panel were greatest for low LD score variants. For rs200884524, a variant on chromosome 5 associated with dyslipidemia and enriched in Polynesians, the imputation quality was highest (r2 = 0.89-0.95) for the reference panels that included Samoan haplotypes. Additionally, the imputed MAF from the reference panels with Samoans (0.207-0.222) was much closer to what is expected via targeted genotyping (0.202-0.233). While not necessarily prescriptive for future studies, in this study we showed that as few as 48 population-specific participants added to 1000G yielded superior imputation quality to TOPMed. Our findings also demonstrated that panels containing Samoan-specific haplotypes improve the imputation of population-specific variants located in small LD blocks the most. These findings provide a framework to help future studies construct reference panels of their own to obtain high-quality imputation for genetic association studies.

StocSum: stochastic summary statistics for whole genome sequencing studies

Submitted by	Chen, Han
Authors	Nannan Wang, Bing Yu, Goo Jun, Qibin Qi, Ramon A. Durazo-Arvizu, Sara Lindstrom, Alanna C. Morrison, Robert C. Kaplan, Eric Boerwinkle, Han Chen
Name and Date of Professional Meeting	ASHG Meeting (November 1-5, 2023)
Associated paper proposal(s)	StocSum: Stochastic summary statistics for whole genome sequencing studies
Working Group(s)	Anthropometry - Adiposity (includes Physical Activity) Blood Pressure Lipids Analysis
Abstract Text	Genomic summary statistics, usually defined as single-variant test results from genome-wide association studies, have been widely used to address different scientific questions in genetic and genomic research, such as meta-analysis, heritability estimation, conditional analysis, variant set and gene-based tests, multiple phenotype analysis, genetic correlation or co-heritability estimation. Applications that involve multiple genetic variants also require their correlations or linkage disequilibrium (LD) information, often obtained from an external reference panel. While these methods usually have good performance for common variants in populations of European ancestry, in practice, it is usually difficult to find suitable external reference panels that represent the LD structure for isolated, underrepresented and admixed populations, or rare genetic variants from whole genome sequencing (WGS) studies, limiting the scope of applications for genomic summary statistics. We have developed StocSum, a novel reference-panel-free statistical framework for generating, managing, and analyzing stochastic summary statistics using random vector algorithms. Regardless of the complex sample correlation structure, StocSum always scales linearly with both the sample size and the number of genetic variants in computing stochastic summary statistics from individual-level data. We develop various downstream applications using StocSum including single-variant tests, conditional association tests, gene-environment interaction tests, variant set tests, as well as meta-analysis and LD score regression tools. The complexity of all these downstream applications does not depend on the sample size. We demonstrate the accuracy and computational efficiency of StocSum using two cohorts from the Trans-Omics for Precision Medicine Program. Specifically, we show that StocSum can be used to perform long-range variant set tests, expanding the aggregation units beyond genes or genomic regions in close proximity. We also show that for admixed populations, LD scores estimated by StocSum are much more accurate compared to those from external reference panels, even if all ancestry populations are included in those reference panels. In summary, as a reference-panel-free framework, StocSum will facilitate sharing and utilization of genomic summary statistics from WGS studies, especially for isolated, underrepresented and admixed populations.

Multi-ancestry Whole Genome Sequencing (WGS) and Meta-analysis to Identify Loci Associated with Non-alcoholic Fatty Liver Disease (NAFLD)

Submitted by	Palmer, Nicholette
Authors	Chinmay Raut, Yanhua Chen, Antonino Oliveri, Mary Feitosa Jeffrey R O’Connell, Kathleen A Ryan, Jerome I Rotter, Stephen S Rich, Kendra A Young, Aaron Hakim, Patricia A Peyser, Lawrence F Bielak, Michelle T Long, Ching-Ti Liu, Dr. Elizabeth K. Speliotes, Nicholette D Palmer, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium and the GOLD Consortium
Name and Date of Professional Meeting	American Association for the Study of Liver Disease (AASLD; November 10-14, 2023)
Associated paper proposal(s)	Whole Genome Sequence Analysis of Non-Alcoholic Fatty Liver Disease in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program
Working Group(s)	Anthropometry - Adiposity (includes Physical Activity)
Abstract Text	Background: Non-alcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease in the US. Notably, disease prevalence differs greatly by race/ethnicity, with the highest prevalence in those of Hispanic and Asian ancestry, and the lowest prevalence in those of African ancestry. To date, studies have identified common variants associated with NAFLD in predominantly European or American populations. We have conducted the largest-to-date multi-ancestry whole genome sequencing (WGS) association study to identify rare variants that promote NAFLD. Methods: Study- and ethnic/race-stratified association analyses were conducted in six cohorts with imaging-measured hepatic steatosis using SAIGEgds adjusted for age, sex, alcoholic drinks per week, and principal component estimates of admixture. Stratified results were meta-analyzed for Hispanic Ancestry, non-Hispanic European Ancestry, non-Hispanic African Ancestry, and non-Hispanic Chinese Ancestry individuals and for an overall analysis using a fixed-effects meta-analysis in METAL. Cochran’s Q test and the I 2 metric were used to identify and quantify heterogeneity. Results: The meta-analysis included 16,664 individuals with imaging-measured hepatic steatosis. Of these, 9,443 were of European Ancestry, 5,918 were of African Ancestry, 937 were of Hispanic Ancestry and 366 were of Chinese Ancestry. The ethnic/race-stratified meta-analysis identified six variants significantly associated (P<=5E-08) with NAFLD, i.e. European Ancestry (n=2), African Ancestry (n=4), including variants in/near PNPLA3, TM6SF2, PPP1R3B, LINC01684, and SLC2A1. An additional 15 variants trended toward association (P<=5E-07) i.e. European Ancestry (n=1), African Ancestry (n=11), Hispanic Ancestry (n=2), and Chinese Ancestry (n=1) with NAFLD. Conclusion: In a large, multiethnic analysis of imaging-measured hepatic steatosis, we replicated loci previously associated with NAFLD and identified possible new race-specific loci. Several variants were trending toward association and will benefit from ongoing analyses to include 6,492 additional samples.

StocSum: stochastic summary statistics for whole genome sequencing studies

Submitted by	Chen, Han
Authors	Nannan Wang, Bing Yu, Goo Jun, Qibin Qi, Ramon A. Durazo-Arvizu, Sara Lindstrom, Alanna C. Morrison, Robert C. Kaplan, Eric Boerwinkle, Han Chen
Name and Date of Professional Meeting	Joint Statistical Meetings (August 5-10, 2023)
Associated paper proposal(s)	StocSum: Stochastic summary statistics for whole genome sequencing studies
Working Group(s)	Anthropometry - Adiposity (includes Physical Activity) Blood Pressure Lipids Analysis
Abstract Text	Genomic summary statistics, usually defined as single-variant test results from genome-wide association studies, have been widely used to advance the genetics field in a wide range of applications. Applications that involve multiple genetic variants also require their correlations or linkage disequilibrium (LD) information, often obtained from an external reference panel. In practice, it is usually difficult to find suitable external reference panels that represent the LD structure for underrepresented and admixed populations, or rare variants from whole genome sequencing (WGS) studies, limiting the scope of applications for genomic summary statistics. We develop StocSum, a novel reference-panel-free statistical framework for generating, managing, and analyzing stochastic summary statistics using random vector algorithms. Regardless of the complex sample correlation structure, StocSum always scales linearly with both the sample size and the number of genetic variants in computing stochastic summary statistics from individual-level data. We demonstrate the accuracy and computational efficiency of StocSum using two cohorts from the Trans-Omics for Precision Medicine WGS studies.