Skip to main content

Analysis

We didn't see this in GWAS: Understanding and fixing unfamiliar problems in association analyses, when pooling whole genome sequence data from multiple studies.

Authors
K. Rice; X. Zheng; S. Gogarten; T. Sofer; C.A. Laurie; C.C. Laurie; B. Weir; T. Thornton; A. Szpiro; J. Bis; J. Brody
Name and Date of Professional Meeting
ASHG (October, 2017)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Large-scale association analyses are now underway, using whole genome sequence (WGS) data on thousands of participants. Unlike earlier GWAS work, where data were combined using meta-analyses of summary statistics, in WGS participant-level data from multiple studies is typically pooled, and results are obtained from a single analysis. While there are good reasons for this approach to WGS, not least computational tractability, we describe how it can lead to false-positive results when the equivalent GWAS-style approach would not.

Specifically, we consider the impact of differential phenotype variances by study (due to e.g. different measurement protocols) and its interplay with adjustment for relatedness across studies (e.g. allowing random variability proportional to a kinship matrix, or other genetic relatedness matrix). As well as explaining why these issues lead to difficulties in WGS where they did not for GWAS, we describe methods suitable for WGS work – available in straightforward and freely-available software – that used pooled data and provide appropriate control of false-positive results. For both single-SNP and region-based analyses, the problems and their solutions are illustrated with several examples from the NHLBI Trans-Omics for Precision Medicine Whole Genome Sequencing Program, TOPMed.
Back to top