Authors |
Madeline Kowalski, Huijun Qian, Ziyi Hou, Jonathan D. Rosen, Laura M. Raffield, Robert Kaplan, Eric Boerwinkle, Kari E. North, Charles Kooperberg, James G. Wilson, Alex P. Reiner, Yun Li, on behalf of the TOPMed Hematology and Hemostasis Working Group
|
Abstract Text |
Background: The NIH/NHLBI Trans-Omics for Precision Medicine (TOPMed) Project generated deep-coverage whole genome sequencing (WGS) on >50,000 individuals from diverse ancestral backgrounds. We anticipated TOPMed sequencing data would improve genotype imputation, particularly for rarer variants and in minority populations.
Methods: We performed imputation with minimac4 using TOPMed data as reference for individuals from the Jackson Heart Study (JHS, all African Americans [AA]) and Hispanic Community Health Study/Study of Latinos (HCHS/SOL, all Hispanic/Latino [HL]). For imputation with JHS subjects, we excluded them from TOPMed data; the remaining subjects were used as reference. Imputation quality was evaluated in 3082 JHS participants at all TOPMed variants not overlapping those on Affymetrix 6.0; and in 12,803 SOL individuals at all imputed MegaArray markers. We use estimated r2 for post-imputation quality control (QC); and dosage/true r2 (squared Pearson correlation between imputed dosages and true genotypes) for quality assessment. We compared performance when using the Haplotype Reference Consortium (HRC) or the 1000 Genomes phase 3 alone as reference.
Results: In JHS, 51 million (M) markers were well-imputed with standard/lenient QC, including 13.1M with sample minor allele frequency (MAF) <0.05%; in SOL, 60M markers well-imputed (28M with MAF <0.05%). In contrast, approximately 25M (7M with MAF <0.05%) and 30M (8M with MAF<0.05%) markers were well-imputed with HRC and 1000G, respectively.
The average dosage r2 for markers with sample MAF <0.05% exceeded 82% (JHS) and 66% (SOL) with standard/lenient QC, and exceeded 87% (JHS) and 78% (SOL) with estimated r2 threshold of 0.8. Towards the rare extreme, in JHS, 39% of markers with TOPMed minor allele count (MAC) 10-20 can be well imputed, with average true r2 77% for sample/JHS singletons, and >80% (80-97%) when JHS MAC >1.
Compared with standard reference panels, TOPMed resulted in many more well-imputed rare variants and in higher imputation quality for these rare variants. For example, TOPMed increased the number of well imputed variants with sample MAF <0.05% by >3x and 6x, with 17-20% and 16-24% improvement in average dosage r2 for markers imputed by both panels, compared to 1000G and HRC, respectively.
Conclusion: TOPMed proves a much better imputation reference panel for minority populations, in terms of both the number of variants imputable and the quality of the imputed variants.
|