Skip to main content

Anthropometry - Adiposity (includes Physical Activity)

TOPMed: Early Insights from Sequencing and Analysis of 45,934 Deep Human Genomes

Authors
Goncalo Abecasis; Pradeep Natarajan; Shawn Lee;
Name and Date of Professional Meeting
ASHG 2016
If not associated with a paper proposal
Not associated with a proposal, but exemption has been approved by P&P co-chairs and this approval has been emailed to ACC.
Working Group(s)
Abstract Text
TOPMed aims to discover mechanisms underlying heart, lung, blood and sleep disorders by whole genome sequencing high-value samples. In a collaboration of diverse scientists, genome sequencing, data coordination and informatics resource centers, and staff at the National Heart Lung and Blood Institute, we deeply sequenced 45,934 genomes (majority non-European, including large numbers of African Americans and Latinos).

These 45,934 samples from 26 studies are sequenced at mean depth 37.8X and passed high quality standards. The rate of newly discovered variants in sequenced individuals has decreased only slowly throughout the project, resulting in a current estimate of ~300 million SNPs and indels (>1 variant per 10 basepairs).

Analysis of 18,877 individuals in the first 10 studies yielded 183 million SNPs and 10 million indels. Among discovered variants, 43.5% were singletons present in a single individual. Deviations from this fraction can identify functional regions targeted by natural selection. Regions >100kb with an extreme low fraction of singletons (<38%) include loci encoding HLA class I and class II genes and ABO. Conversely, large fractions of singletons are observed in >100kb windows around TP53BP1 (>48% singletons), among missense variants and inframe indels (48%) and among truncating mutations and frameshift indels (57%).

Early results identify association signals across the frequency spectrum. For example, we recapitulate association with LDL cholesterol at p<5x10-8 for a common non-coding variant at SORT1 and for rare coding variants in PCSK9 (p.R46L, p.C679X). For LDL, additional signals include a haplotype of two African ancestry-specific variants (both frequency 1.1%, r2 = 1) associated with a 28 mg/dl decrease in LDL (rs17249141 in LDLR promoter and rs114197570 in enhancer 4-kb upstream). For BMI, we recapitulate association at p<5x10-8 with common variants near FTO and identify new signals at low-frequency variants near DNAH5 (rs76221701).

In October 2016, TOPMed will release ~10,000 genomes through dbGaP. A new public variant server will allow easy browsing of all catalogued variants. Preliminary analyses suggest that haplotype imputation will exceed all current panels in accuracy, in individuals of European, African-American or Latino ancestry. In summary, we illustrate the importance of deep genome sequencing to understand human genetic variation and its association with disease, and announce release of the first TOPMed results and data.
Back to top