Skip to main content

Analysis

Effects of reference panel composition on genotype imputation quality in founder samples.

Authors
K. Ryan, C. Van Hout, J. O'Connell, J. Perry, B. Gaynor, H. Xu, M. Montasser, T. O'Connor, N. Gosalia, A. Shuldiner, B. Mitchell
Name and Date of Professional Meeting
ASHG 2019 Meeting (October 17, 2019)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Genotype imputation, the process of leveraging haplotypes observed in a reference panel to infer unobserved genotypes in other samples, is widely used in genetic studies. The performance of imputation depends on the representativeness of haplotypes present in the reference panel, which is a function of the number and diversity of haplotypes in the reference panel. In this study we evaluate imputation quality for DNA samples from a founder population (Amish) as a function of the relative mix of founder and nonfounder haplotypes in the reference panel.
We evaluated imputation performance using 3 different imputation reference panels for 6,153 Amish individuals genotyped on the Illumina Global Screening Array. Genotypes were imputed using the TOPMed reference panel that included 1,025 Amish whole genome sequences (n=54,035 subjs; TOPMed-global), or the one excluding the Amish (n=53,010 subjs; TOPMed-noAmish), or the one with Amish only (n=1,025; Amish-only). We evaluated imputation performance and quality across the 3 reference panels by comparing the number of imputed SNPs across panels and assessing concordance of imputed genotypes with genotypes obtained using whole exome sequencing (n=5,317).
All 3 panels imputed approximately the same number of common (MAF>5%) SNPs. Far more rare SNPs (MAF<0.5%) were imputed using the TOPMed-noAmish reference panel (n=1,929,020) than the Amish-only (n=126,510) and TOPMed-global (n=392,323) panels. Imputed genotypes from the TOPMed-noAmish panel also had much lower non-reference allele concordance rates compared to Amish-only and TOPMed-global (88.6% vs. 95.7% and 94.9% respectively for MAF 0.5 to 5%). These results suggest that the use of a cosmopolitan panel without a representation of the founder population could overcall false-positive rare variants (i.e., impute genotypes that may not exist) and support the further development of whole genome sequence reference panels for founder populations.
Back to top