All relevant forms of data (aligned sequence data, genotype call sets and phenotype data) may be stored in the TOPMed Exchange Area (EA), a temporary holding area at dbGaP that provides pre-release access to data for TOPMed investigators (i.e. prior to release to the general scientific community). The EA consists of multiple components: a combined EA for cross-study genotype call sets and study-specific EAs to contain phenotypic and other data types. Within each study-specific EA there is a link to the aligned sequence data files
TOPMed studies are also being released in dbGaP for access by the scientific community. Following release of a major data freeze by the IRC, the DCC works with study investigators to complete study-specific QC and pre-curation of dbGaP files to facilitate timely release on dbGaP. Search the dbGaP site for “TOPMed” to identify currently released accessions.
Phenotypes harmonized centrally by the DCC are added to the study-specific EAs as they are generated. In addition, many working groups are exchanging harmonized phenotype data through the study-specific EAs. Data in the EAs will only be accessible by TOPMed investigators (as designated by TOPMed study PIs), and access will be obtained by application to dbGaP; see TOPMed Data Sharing Policies for more information.
The ACC suggests two approaches to obtaining and sharing TOPMed data:
- Approach 1: Files may be uploaded to a study-specific EA by each study that has completed dbGaP registration. Individuals designated as eligible by study PIs may apply for access to the study-specific EAs, which also provides access to the combined EA containing genotype call sets. These applications are reviewed by the NHLBI Data Access Committee. Approved applicants may download data from the Exchange Areas to their local institutional servers. This approach is outlined under Data sharing through the dbGaP Exchange Areas.
- Approach 2: This is similar to approach 1, but the application process is coordinated among a group of investigators who intend to share data with one another, generally in a cloud computing environment. This approach is described under Sharing dbGaP Data in a Cloud Environment.
Investigators should note that:
- Investigators named on Data Access Requests are responsible for assuring that all analyses performed with the data they download are compliant with data-use limitations
- These procedures for sharing individual-level data have more administrative overhead than familiar GWAS results-sharing approaches, which rely on meta-analysis of study-specific results. However, this form of meta-analysis is not recommended for WGS data; not only would it require sharing of impractically-large files, but it severely limits the forms of statistical analysis that may be successfully performed. Once access to the data is obtained, performing the analysis in one step should also be much faster and less error-prone than creating, sharing and meta-analyzing results files from individual studies.