Uploading to the dbGaP Exchange Area
Who is an uploader and how do I assign a new uploader?
- Ask the dbGaP study PI to login to the dbGaP Submission Portal.
- Click on your TOPMed study’s name. Make sure it is the TOPMed phs number.
- Near the top of the study page, click on “Add submitter”, then add the new submitter’s (uploader's) email.
- By default, the next screen will show “Submitter: Can modify study data” as checked. That’s good. Optionally, you can instead check “Manager: Can add/remove users from study” if you wish to delegate someone else to manage your uploaders/submitters.
- Note: Submitters will then be able to make uploads to your TOPMed study to either regular dbGaP (for eventual release) as well as to the Exchange Area (for sharing with approved TOPMed users).
- If this did not work for you, send an email to your dbGaP curator including the names and emails of who you wish to add as a submitter. Your dbGaP curator is the person who communicates any issues in the submitted data files. If unsure who is your dbGaP curator, search your email by TOPMed phs number. Or if you can access your study’s Submission Portal, in the upper right corner of the webpage, click on “Send e-mail to dbGaP curator”.
How do I upload to the Exchange Area?
Steps to upload
- Uploads are done through the dbGaP Submission Portal, via the same people who are submitters of the standard dbGaP files to dbGaP. See previous question “Who is an uploader…?”.
- Once you are an uploader, login to dbGaP with your eRA or myNCBI ID (or click 3rd party for University or institutional login).
- Click on your TOPMed study’s name. Make sure it is the TOPMed phs number.
- When your study’s page opens, scroll to the bottom tab labeled “Other files”, click on the menu “Select file type” and select “Exchange area files”.
- Please use a consistent file-naming convention for uploaded files, such as “StudyName_DomainOrSubdomain_YYYYMMDD_initials.csv”. Please keep in mind uploads will be coming from many WGs, so meaningful filenames will be appreciated by the file recipients.
- A similarly named documentation or readme file should accompany every upload. If you zip your data and documentation files together, please use the same file-naming recommendation as above for the .zip filename as well.
- The recipient may wish to request the date of upload along with each study phs number to help find the upload. Uploads automatically get placed in dated subfolders in the study’s EA upon upload.
- More info on data sharing through dbGaP exchange areas
IDs to use in uploaded files
The TOPMed DCC strongly encourages use of SUBJECT_IDs used by studies within their submitted dbGaP files for all phenotype files. We do not encourage use of the NWD_IDs within phenotype files due to ongoing genotype QC efforts where sample swaps and other sample identity issues are identified. We feel strongly that phenotype files are indexed by SUBJECT_ID while VCF and other sequencing data are indexed by NWD_ID (SAMPLE_ID submitted to dbGaP). This way both file types remain robust to QC issues that arise.
The sample and subjects are easily mapped by making use of the TOPMed-wide Sample Annotation. If QC issues are identified, we simply update the TOPMed-wide Sample Annotation file without any need to update the clunkier and numerous genotype and phenotype files.
Note to a cross-study analyst: When bringing together individual-level phenotype data from multiple phs numbers, you should always concatenate a study identifier like study acronym (e.g. JHS, FHS, …) to the SUBJECT_IDs in order to avoid clashes. For example, study A uses SUBJECT_ID “123” and study B also uses SUBJECT_ID “123” and they refer to different participants. But in a pooled phenotype file, we’d have SUBJECT_IDs “studyA_123" and “studyB_123” as unique people.