Abstract Text |
Background: Genome-wide association studies have identified common variants associated with chronic obstructive pulmonary disease (COPD). Rare variants can also increase COPD susceptibility but have been challenging to identify. While exome sequencing has traditionally been performed to identify rare disease variants, whole-genome sequencing (WGS) offers several advantages, including potentially improved accuracy in coding regions and interrogation of non-coding regions. We hypothesize that WGS in subjects with severe COPD and smoking controls with preserved pulmonary function will allow us to identify novel genetic determinants for COPD.
Methods: As part of the NHLBI Trans‐Omics for Precision Medicine (TOPMed) program, we submitted 2000 subjects for WGS. COPDGene non-Hispanic white (Nhw, 500) and African-American (AA, 330) cases had severe (GOLD spirometry grade 3-4) COPD with a median forced expiratory volume in 1 second (FEV1) of 35% and 39% predicted, respectively. Current and ex-smoker controls (500 Nhw and 570 Aa) had FEV1 > 85%. One hundred probands from the Boston-Early Onset COPD Study (EOCOPD) had a median FEV1 of 20% predicted. Samples were sequenced at > 30x coverage on the Illumina X10 through the University of Washington using PCR-free libraries, with centralized mapping and variant calling by the University of Michigan Informatics Research Core. We assessed characteristics of variants in pre-defined regions using SNPEff and the combined annotation dependent depletion (CADD) score.
Results: To date, we have received whole genome sequencing data on 692 samples. The overall concordance rate with existing array genotyping exceeded 99%. After removing 3 mismatched subjects, we identified a total of 41,819,809 single nucleotide variants, of which 10,010,421 were novel (not previously described in dbSnp144) in 46 EOCOPD cases, 144 COPDGene Nhw cases and 151 controls, and 112 AA cases and 236 controls. Within a set of 129 genes from Mendelian diseases associated with COPD or emphysema, or at loci from genome-wide association studies of COPD or lung function, we identified 1870 nonsynonymous, stop, or splice variants, including 50 putative loss-of-function variants. In 500kb regions around each of six genome-wide significant regions from a recent study of COPD, we identified 813 putatively deleterious non-coding variants based on a CADD score of > 15.
Conclusions: Whole-genome sequencing can identify large numbers of potentially functional and deleterious variants, and will serve as an important resource for identifying causal variants at known and novel loci. Future plans include association tests for affection status and for secondary phenotypes including imaging and transcriptomics data.
Funding: This work was supported by NHLBI R01 HL084323, P01 HL083069, P01 HL105339 and R01 HL089856 (E.K.S.); R01 HL113264 (M.H.C. and E.K.S.), and R01 HL089897 (J.D.C.). The COPDGene study (NCT00608764) is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, GlaxoSmithKline, Siemens and Sunovion.
|