Skip to main content

TOPMed Data Sharing Policies

To facilitate the access, sharing, and broad usage of data across TOPMed, NHLBI and the TOPMed Administrative Coordinating Center (ACC) have, in consultation with various TOPMed stakeholders, established the following Data Sharing Policy. The processes and procedures outlined herein are designed to promote scientific efficiency and collaboration while protecting participants’ privacy and respecting consent. 

This policy parallels the TOPMed publication guidelines for requesting data use. Refer to this FAQ for a more explicit distinction between data access/sharing and data use. 

Notably, it is incumbent upon TOPMed investigators to be aware of and follow additional policies that apply to their activities, whether within or beyond the scope of this TOPMed Data Sharing Policy. Such additional policies include the National Institutes of Health Genomic Data Sharing Policy (GDS), dbGaP policies, the Genomic Data User Code of Conduct , and any other local or institution-specific policies.

Pre-release TOPMed genotype and phenotype data are stored in dbGaP Exchange Areas (EAs). Data in the EAs will be accessible only by TOPMed investigators (i.e. individuals designated by project PIs) via application to dbGaP through a TOPMed-specific Data Access Request (DAR)

There is a separate EA for each study, containing phenotype and other study-specific data, such as prior SNP microarray data, TOPMed omics data, pedigree structures and related meta-data. 

Note for some studies with prior, non-TOPMed accessions, phenotype and pedigree files may reside with these prior accessions, accessible only as released data and not through the TOPMed EA.

In contrast, the TOPMed genotypes from the joint-calling of sequence data reside together in one file per chromosome (i.e. not split by study or consent) in a combined EA directory. These TOPMed cross-study genotype files are accessible by investigators who have approved access to at least one study-specific EA. The genotype data are kept together to avoid the considerable effort required to split and re-join the very large VCF files. 

Derived TOPMed data (e.g. harmonized phenotypes) can be uploaded, by an approved investigator, to the EA for a given study and thereby be made available to other investigators who have approved EA access for that study. Note that study approval is needed for these derived data to be released on dbGaP. DCC-harmonized phenotypes are also available in study-specific EAs. 

When study investigators apply for access to the EAs, they will certify that all data uses will comply with the Data Use Limitations specified for each study and consent group. This certification is critical because all TOPMed joint-called genotype data will be made available to each approved applicant via the combined EA, in addition to the requested study-specific phenotype data. Restrictions on data use in Working Groups will be made through paper proposals and accompanying dataset selection requests (see Data Use Guidelines).

TOPMed sequence and new or harmonized phenotype data for each study will be released approximately six months after genotype call sets have been posted to the EA by the Informatics Research Center (IRC). Once released, data will be accessible by the general scientific community through standard dbGaP Data Access Requests, which are reviewed by the NHLBI Data Access Committee (DAC).

Prerequisites

The following section outlines the prerequisites for initiating a DAR application for the TOPMed Exchange Areas:

  1. The applicant’s name must be included in the list of TOPMed investigators eligible to submit DARs (also referred to as the “DAC List”) for the TOPMed Exchange Areas. See the section on ‘Eligible investigators’ for more information on the DAC List.
  2. The applicant must have an eRA Commons account.
  3. The applicant must have a dbGaP account in good standing.
  4. The applicant’s TOPMed study must have completed dbGaP registration and submission of required phenotype data. Required phenotypes include those that the study intends to analyze with their own WGS data, but the NHLBI program strongly encourages submission of a broad range of phenotypes for cross-study analyses. 
  5. The applicant must have a list of dbGaP phs accession numbers for the EAs to be requested.
  6. Local Institutional Review Board (IRB) approval for the applicant’s proposed TOPMed analyses is obtained.
    1. It must be clear that the IRB approval is for receiving data for TOPMed analyses. It is suggested to use the word “TOPMed” in the title.
    2. We suggest requesting blanket approval for multiple cross-study analyses to avoid repeated IRB modification requests.
    3. The type of IRB approval required is ‘full’ or ‘expedited’. An IRB determination that your proposed research is 'exempt' or 'not human subjects research' does not fulfill this requirement.
    4. There must be more than 3 months remaining on an IRB approval expiration date for the dbGaP project request to be considered.
    5. Although some data sets do not require IRB approval, access to any study-specific EA also provides access to the joint genotype call set in the combined EA. Therefore, IRB approval is strongly suggested for all requests.
    6. Compliant with the 2019 Final Revisions to the Common Rule (see associated FAQ), continuing review (e.g., annual renewal) of IRB approval may or may not be required. Per NHLBI, when requesting studies registered or initiated prior to January 2020, annual IRB renewal will continue to be required. For studies initiated or registered after January 2020, the submitting institution must provide justification to require annual review of an applicant’s IRB approval; otherwise, annual review will not be required.  
  7. If requesting datasets that have a consent group that contains a Collaboration Required (-COL) modifier, the applicant must provide a letter of collaboration with the primary study investigator(s), as described in the Instructions for Online Data Access Request (DAR) for TOPMed dbGaP Exchange Areas (see “Letters of collaboration” sub-bullets under “Review DUC” section). The letter of collaboration must be renewed every year.

 

NHLBI Clinical Data Science IRB

The NHLBI established the Clinical Data Science IRB (CDS-IRB) to provide a useful resource for the research community by offering—at no cost—central review of secondary research proposals utilizing NHLBI datasets for which IRB approval is required.  The CDS-IRB is an option for TOPMed studies completing data access renewals for dbGaP.  Additional information can be found on the CDS-IRB website:  https://www.nhlbi.nih.gov/review-boards-and-committees/clinical-data-science-irb. To begin using this using resource, please contact Julie Mikulla.

 

Eligible investigators

TOPMed Project or Center PIs will name TOPMed collaborators who are eligible to request access to the TOPMed EAs. These individuals are referred to as ‘eligible investigators’ and may be chosen based on the following scenarios:

  1. Each TOPMed Project or Center can name up to 12 eligible investigators — or one per study if there are more than eight studies comprising a Project. 
  2. Eligible investigators must meet the minimal qualifications specified by dbGaP to submit a dbGaP project request .
  3. Awardees of R01, U01, and other TOPMed-specific funding opportunities may each name up to three eligible investigators to apply for EA access.
  4. TOPMed investigators who receive NIH-sponsored, investigator-initiated awards centered on TOPMed data may name up to three eligible investigators to apply for EA access. (See https://topmed.nhlbi.nih.gov/awards/NIHIIA for examples.)

The list of eligible study investigators (DAC List) will be regularly given to the NHLBI DAC. Only individuals on this list may apply for access to the TOPMed EAs. 

Note TOPMed investigators not on the DAC list may apply for released TOPMed data per the standard dbGaP application process (see TOPMed Data Access for the Scientific Community).

 

Online DAR application

Once all prerequisites are met for initiating a TOPMed DAR application, investigators may apply for access to specific study EAs (including their own study’s EA) by following the given instructions for the online application. Approval is expected if the TOPMed-specific process is followed and the applicant is on the TOPMed eligible study investigator list.

External collaborators (i.e. collaborators at a different institution than the applicant) are required to submit separate applications for data access. Separate access requests must be filed per institution, even if collaborators are within the same funded study within TOPMed.  

When a study investigator moves to a new institution, s/he must submit a new request from that institution. Data obtained via application from one institution may not be transferred to another institution.

Application for data sharing in a cloud environment

A group of eligible investigators at different institutions may mutually agree to form a ‘data sharing group’ to share data across those institutions. As mentioned earlier, these external collaborators must submit separate access requests according to the outlined instructions with the following additional requirements:

  1. The External Collaborators list must include everyone in the data sharing group.
  2. The requested datasets must be in common.
  3. A Cloud Use Statement must also be provided along with descriptions of any collaborator roles.

Internal collaborators named in a given application for an institution may share data downloaded by the approved applicant. For example, analysts and trainees gain access to data through the PI (or authorized collaborator thereof) of a given study at their own institution. In that case, the approved applicant is responsible for ensuring that subsequent data use stays within the scope of the approved Research Use Statement and is compliant with Data Use Limitations. Therefore, individuals at the same institution (and thus potential internal collaborators) may wish to submit separate DAR applications.

TOPMed data may be shared between two or more different studies, by uploading and downloading from their studies’ EAs, provided that: 1) each study is fully registered, 2) each study has an active study-specific EA, and 3) there is an approved DAR for the other study’s/studies’ data. See instructions for Data sharing through the dbGaP Exchange Areas for information on how to upload and download data from the EAs.

Data downloaded from an EA may not be transferred outside of the applicant’s group and institution, with the following exceptions:

  1. The data was generated by the applicant’s own study (i.e. study-specific phenotypes, genotypes and sequence data). Studies are encouraged to discuss this type of sharing with their NHLBI Program Officer.
  2. The investigators involved have formed a ‘data sharing group’ and comply with the conditions outlined in the next section (‘Data sharing amongst TOPMed investigators at different institutions’).

Study data NOT obtained from dbGaP may be shared directly between two studies, given the necessary Data Transfer Agreements between the participating academic institutions.

 

Data sharing amongst TOPMed investigators at different institutions

Data sharing across institutions may be accomplished through the TOPMed EAs, as described in the Data sharing through the dbGaP exchange areas instructions. Data may also be shared locally with individuals listed as “Internal Collaborators” in the application.

Additionally, investigators in a ‘data sharing group’ may share data in a cloud environment with other collaborators in the group, pending compliance with the data access request instructions mentioned in the previous section on the DAR process (‘Application for data sharing in a cloud environment’). Once multiple investigators in the group obtain approval for their ‘common data sets,’ “[d]ata may be encrypted and mailed to approved [external] collaborators on a hard drive, or shared with approved collaborators over a virtual private network or in a cloud environment,” according to the NIH Genomic Data Sharing Policy (GDS). If sharing in a cloud environment, a management plan  should be provided to the ACC.

Data use regulation and governance in TOPMed

Data sharing in TOPMed will occur primarily through the dbGaP TOPMed Exchange Area (EA). For data obtained through dbGaP (from either the EA or a released study), data usage is regulated by the NHLBI according to the study’s Institutional Certification, which specifies Data Use Limitations. TOPMed Principal Investigators (PIs) are responsible for submitting the Institutional Certification at the time of TOPMed study registration.

Data uses are governed by the participant’s informed consent, as interpreted by each study’s PI(s) and their institutional IRB. The TOPMed PI(s) associated with each study are responsible for making consent types and data use restrictions known to the TOPMed collaborators with whom they share data directly (i.e. outside of dbGaP).

 

Data use in TOPMed manuscripts

Cross-study paper proposals originate in the TOPMed Working Groups, which may produce manuscripts for publication using data shared in the TOPMed EAs or released in dbGaP. To ensure transparency of data usage and acknowledgment of any Data Use Limitations, authors must obtain permission from each study PI to include their study’s data, as well as submit a paper proposal for their manuscript following specific paper proposal instructions, prior to starting work on their manuscript. 

The paper proposal instructions have two parts:

  1. Basic information, scientific proposal and selection of TOPMed projects to include, via the Paper Proposal submission form; and
  2. Selection of specific study-consent groups to be used, via the Request data sets form.

Study-consent groups will not be available for selection until the study has been registered in dbGaP. Therefore, the proposer may need to update their dataset selection when additional study-consent groups become available. When the proposer selects a study-consent group, they agree that its Data Use Limitations will be respected.

Study dataset contacts will be notified when one of their study’s consent groups is selected for a given proposal and will have two weeks to approve or request modification. Reasons for requesting modification might include:

  • the study wants to perform study-specific preliminary analyses before joining cross-study analyses;
  • the PI was not contacted previously and needs further information from the proposer; or
  • issues related to consent, potential stigmatization or harm to participants.

Modification requests at the stage of study-consent selection should be infrequent, since proposers should have already obtained permission from the study PI to include their study prior to proposal submission. Refer to the TOPMed Publications Policy for further information on the publications process, including specific information on “single PI” proposals and manuscripts.

 

Additional data use considerations

This section applies to data obtained from dbGaP. Different rules may apply to data obtained by other mechanisms (e.g., directly from study investigators).

NIH provides consent group titles and their associated standard Data Use Limitations  , accompanied by further interpretation  .  Note that “General Research Use” permits “research relating to population structure,” while “Health/Medical/Biomedical” excludes “the study of population origins or ancestry” and “Disease specific” includes only “research on a specific disease or related condition.”

The TOPMed ELSI Committee can advise study PIs on approaches to provide evidence to their IRBs or Institutional Certification boards regarding support for broad use of TOPMed data.

Regarding indirect uses of data, such as imputation reference panels, common controls for association studies, and variant summary statistics: 

  1. Data from an individual with a disease-specific consent will not be used in analyses outside of that restriction, unless specifically allowed by the Institutional Certification.
  2. Contribution of data to public (non-controlled) access servers is not allowed for individual-level data or for data summaries that could be used for individual identification. Contribution to public servers that protect individual-level data is allowed if specified in the Institutional Certification.
  3. NHLBI urges all TOPMed investigators to update their Institutional Certification with Data Use Limitations that specify whether data may be used for the following:
    1. Contribution to variant summary statistics to public variant servers.
    2. Inclusion of individual-level data as reference samples in public imputation servers that protect the individual-level data (i.e. where the reference samples are not accessed by the server’s users).
    3. Inclusion of individual-level data as common controls in association studies involving cases with diseases that are outside of the participants’ disease-specific consent.
  4. Effective November 2018, Genomic Summary Results (GSR) such as allele frequencies and association results are to be publicly available unless a study designates as “sensitive” and accordingly updates their Institutional Certification to indicate that GSR should remain under controlled access.
    1. TOPMed studies with questions about this should contact the NHLBI DAC and/or their GPA (see Key Contacts).

For current information on NIH-wide data sharing policies, including GSR, please refer to the NIH Genomic Data Sharing website.

Please direct questions regarding data-sharing policies, Data Access Requests, and dbGaP data submission as indicated on the Key Contacts page.

Back to top