2017 TOPMed DCC Analysis Workshop
August 7 – 9, 2017, Seattle, WA
Thank you for your participation, this event is now closed. The workshop materials remain available via the links below.
Session 0: Introduction and Logistics
Session 1. Introduction to TOPMed data
This session reviews what data is available, how to access it, and some of the QC steps that DCC implements for TOPMed data:
- Introduction to TOPMed and Data Sharing [.pptx]
- QC steps for Freeze 4 data, including variant, sample, and pedigree-based checks [.pdf]
- Introduction to the Genomic Data Storage (GDS) format and tools for creating and using files in this format [.pdf]
- Worked example
- Exercises
Session 2a. Population structure and relatedness
This session describes how TOPMed data can be used to illustrate and estimate measures of relatedness, either at the population level or between specific individuals
- Review of what population structure is, and how related quantities can be estimated using TOPMed data [.pdf]
- Estimating relatedness for participants in TOPMed [.pdf]
- R packages used in both tasks [.pdf]
- Worked examples
- Exercises
Session 2b. Phenotypes
This session describes how to access, examine and harmonize phenotypes from multiple TOPMed studies
- Guidelines for how to harmonize phenotypes, within TOPMed [.pdf]
- Accessing and using unharmonized TOPMed phenotypes, via dbGaP [.pdf]
- Worked examples
Session 3. Association tests
This session reviews why association tests are a useful way to analyze TOPMed data, describes how single-variant association tests work – and why this can be challenging within TOPMed – before briefly summarizing some multiple-variant tests.
- Slides describing the methods and their motivation, strengths, and weaknesses [.pdf]
- R functions from the DCC pipeline that implement widely-used association tests [.pdf]
- Worked examples
- Exercises for single variant tests
- Exercise for multiple variant tests
Session 4. Variant annotation
This session introduces variant annotation for TOPMed, including how to define and filter aggregation units using variant annotations
Session 5a. DCC pipeline
This session introduces the DCC and Analysis Commons pipeline, emphasizing how cluster and cloud computing can be used to implement high-throughput TOPMed analyses efficiently
- Introduction [.pdf]
- Multi-threaded versions of some analyses seen in earlier sessions [.pdf]
- Running the same analyses on the cloud via Amazon Web Services [.pdf]
- Wrap-up, discussing approximate costs and where to learn more [.pdf]
- Worked examples
Session 5b. Analysis Commons
This session describes the Analysis Commons, a DNANexus-based system for cloud computing with TOPMed and similar data.
View Workshop Details
View Attendees Contact List