Skip to main content

Kidney Function

AI-STAAR: An ancestry-informed association analysis framework for large-scale multi-ancestry whole genome sequencing studies

Authors
Wenbo Wang, Laura Y. Zhou, Diptavo Dutta, Yun Li, Tamar Sofer, Nora Franceschini, Zilin Li, Joseph G. Ibrahim, Xihao Li, on behalf of the TOPMed Kidney Function Working Group
Name and Date of Professional Meeting
ASHG Annual Meeting (November 5-9, 2024)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Introduction
Large-scale whole genome sequencing (WGS) studies enable the detection of common and rare variants (RVs) associated with complex diseases or traits. With the increasing availability of WGS data representing participants from diverse populations, it is of interest to address heterogeneity in allelic effect sizes across ancestries to improve statistical power of association analyses and detect complex trait loci when the underlying causal variants are shared between ancestry groups with heterogeneous effects. Existing association analysis methods are limited in leveraging multi-ancestry variant effect heterogeneity, especially for under-represented ancestry populations.

Methods
We propose AI (Ancestry-Informed)-STAAR, a powerful and scalable association analysis framework for ancestry- and functionally-informed genetic association analysis in biobank-scale multi-ancestry sequencing studies. AI-STAAR performs ancestry-informed association analysis to improve the power of single variant analysis for common variants and variant-set analysis for rare variants by modeling the potential heterogeneity through ensemble weighting informed by ancestry-specific variant allele frequencies and effect sizes, while accounting for population stratification and relatedness within and across ancestries. AI-STAAR further facilitates functionally-informed association analysis of both coding and noncoding RVs by incorporating multiple categorical and quantitative functional annotations for variant grouping and weighting.

Results
We applied AI-STAAR to perform WGS common and rare variant analysis of derived kidney function traits, estimate glomerular filtration rate (eGFR) and urine albumin-creatinine ratio (UACR), from the NHLBI TOPMed consortium. Among 45,090 and 18,869 participants with eGFR and UACR from diverse ancestries, AI-STAAR detected single variant 22-40220108-G-A for eGFR and 1-231196875-C-A for UACR, as well as RVs residing in BAZ2A enhancer regions and of CIR1 UTR for UACR. These were missed by methods that do not account for heterogeneous ancestry effects. In addition to improved power for detecting associations accounting for effect size heterogeneity, AI-STAAR identifies the ancestry group(s) with strongest variant associations: 22-40220108-G-A for eGFR and 1-231196875-C-A for UACR were driven by East Asian and European ancestries, respectively; the RVs of BAZ2A and CIR1 for UACR were African ancestry.

Summary
AI-STAAR is a powerful and computationally scalable framework that leverages allelic heterogeneity by ancestry for genetic association analysis in multi-ancestry sequencing studies.

Key Words:
Genome-sequencing; Genome-wide association; Statistical genetics; Rare variants; Genetic diversity

Use of Polygenic Risk Scores to Improve GFR Estimating Equation in CRIC and MESA

Authors
Laura Zhou, Quan Sun, Josyf Mychaleckyj, Holly Kramer, Stephen Rich, Jerome Rotter, Megan Shuey, Nancy Cox, NHBLI Trans-Omics for Precision Medicine (TOPMed) Kidney Function Working Group, Chronic Renal Insufficiency Cohort (CRIC), Lesley Inker, Nora Franceschini, Yun Li
Name and Date of Professional Meeting
ASHG Conference (Nov 1-5 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Glomerular filtration rate (GFR), an estimate of kidney function, is usually not directly measured in clinical practice. Instead, predictive equations of GFR were developed to estimate GFR (eGFR). The 2021 race-free Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equations relate measured GFR (mGFR) to age, sex, and (1) serum creatinine or (2) serum creatinine and cystatin-C. Serum creatinine and cystatin-C have spline forms to account for the changing linear trend (creatinine: knot at 0.7 mg/dl for women and 0.9 mg/dL for men, cystatin-C: knot at 0.8 mg/l). We hypothesized that accounting for the genetic variation in eGFR, through a polygenic risk score (PRS), would improve the eGFR prediction of mGFR. Using the 2021 CKD-EPI forms for serum creatinine, cystatin-C, age, and sex, we added eGFR PRS as a covariate to the model. We fit a linear model including PRS for (1) creatinine only and (2) creatinine and cystatin-C equations, where eGFR PRS was computed with PRS-CS using European summary statistics from the Chronic Kidney Disease Genetics Consortium (SNPs = 8,834,748; minor allele count > 10). Using cross validations, we compared our PRS estimating equations with the (1) CKD-EPI 2021 creatinine-only and (2) CKD-EPI 2021 creatine and cystatin-C equation in the Chronic Renal Insufficiency Cohort (CRIC) study of 1327 African American and White individuals. Performance measures included bias (median of difference (mGFR-eGFR)), precision (interquartile range for difference), accuracy (median of absolute difference, RMSE relative to mGFR, and percent of estimates within 30% of mGFR), and ROC. Chronic kidney disease, defined as GFR < 60, is a highly prevalent disease that affects many clinical decisions in the US. Thus, we are also interested in the performance of the eGFR equations in the cohort with mGFR< 60 versus the cohort with mGFR > 60. All performance measures were improved or performed comparably to CKD-EPI 2021 for predicting mGFR. For creatinine-only equations, our PRS eGFR creatinine equation improved the bias by 86% compared to the 2021 CKD-EPI creatinine only equation (0.50 vs 3.66). Our creatinine and cystatin-C equation with eGFR PRS improved bias by 92% compared to CKD-EPI 2021 creatine and cystatin-C equation (0.10 vs -2.02). For individuals with mGFR< 60, our PRS eGFR equation improved the accuracy by 73% (median absolute difference 1.51 versus 5.63 for CKD-EPI 2021) in the creatinine only equation and 7% (11.4 vs. 14.1) in the creatinine and cystatin-C equation. Further exploration on the form of PRS from diverse populations should further increase the performance. Additionally, we will replicate and validate our results in the MESA cohort.

Whole Genome Sequencing Analyses of 45,090 Individuals Reveal Rare Coding and Noncoding Variants Associated with Kidney Function

Authors
Zilin Li, Xihao Li, Bridget Lin, Holly Kramer, Nora Franceschini, and Xihong Lin, on behalf of the TOPMed Kidney Function Working Group
Name and Date of Professional Meeting
ASHG 2022 (Oct 25-Oct 29)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Introduction
Chronic kidney disease (CKD) affects over 850 million adults worldwide and is projected to be the 5th most common cause of years of life lost by 2040. CKD is defined by low estimated glomerular filtration rate (eGFR) and/or increased urine albumin to creatinine ratio (UACR). Genome wide association studies (GWAS) have reported associations of eGFR and UACR with thousands of common and low-frequency variants, but these variants account for only a small fraction of heritability. Rare variants (RVs) may account for some of the unaccounted heritability. Large-scale whole-genome sequencing (WGS) studies, such as the multi-ethnic NHLBI Trans-Omics Precision Medicine (TOPMed) Program, provide the opportunity to assess associations of eGFR and UACR with rare variants across the genome, especially in the noncoding region.

Hypothesis
Rare variant aggregations are associated with eGFR and UACR.

Methods
We applied our newly developed STAARpipeline to detect rare variants (MAF ≤ 0.01) associated with eGFR and UACR using 45,090 and 18,869 individuals from TOPMed Freeze 8 WGS data. STAARpipeline provides gene-centric analysis and non-gene-centric analysis using a variety of coding and noncoding masks. The gene-centric analysis provides five coding and eight noncoding functional categories. The non-gene-centric analysis includes sliding window analysis with fixed sizes and dynamic window analysis with data-adaptive sizes.

Results
For eGFR, the gene-centric analysis identified a genome-wide significant association of missense RVs in SLC47A1 at the Bonferroni-corrected level 5.00E-07 (=0.05/20,000/5). After conditioning on known eGFR-associated variants, the strength of the association was attenuated but it remained significant at level 2.50E-06. For UACR, the 2-kb sliding window procedure identified a genome-wide significant association of RVs in an intergenic region near ASB1 at the Bonferroni-corrected level 1.88E-08 (=0.05/2.66E06). After conditioning on known UACR-associated variants, the association remained significant at the same level 1.88E-08. The dynamic window procedure additionally detected two significant associations at the genome-wide error rate 0.05 level, including intronic RVs of TRIM67 and ERCC6L2. These two associations remained significant at level 1.88E-08 in conditional analysis.

Conclusions
Four new RV associations, including missense RVs in SLC47A1 with eGFR, RVs in an intergenic region near ASB1 with UACR, and intronic RVs of TRIM67 and ERCC6L2 with UACR, were identified using the TOPMed WGS Freeze 8 data through STAARpipeline. These findings suggest a role of rare variants in kidney traits.

Key Words:
Genome sequencing; Genome-wide association; Rare variants; Statistical Genetics; Complex Traits
Back to top