Skip to main content

Genetic

 

Genetic Data and Related Resources

Genotype Data Freezes
Variant Discovery Date BAM date6 Genome Build n_variants n_samples Used for Variant Discovery n_samples Registered in dbGaP1 n_studies registered in dbGaP Phases

Freeze 1

Oct 2015 2015-08-12 37 87 M

4,317

2,643 8

1

Freeze 2a

Jan 2016 2015-11-10 37 141 M

10,597

9,109 15

1

Freeze 3a2

Jul 2016 2016-03-25 37 199 M

18,877

16,558 16

1

Phased 3a

Dec 2016 2016-03-25 37 207 M 18,877 18,258 19 1

Freeze 4

Mar 2017 2016-08-10 37 219 M 19,044 18,526 19 1

Freeze 5b

Sep 2017 2017-09-20 38 582 M 64,960 54,499 32 1, 2

Freeze 6a

Aug 2018 2018-01-08 38 721 M 122K3 107,047 64 1, 2, 3
Freeze 8 Feb 2019 2018-09-14 38 1.02 B 186K4 140,306 72 1, 2, 3, 4
Freeze 9b Feb 2020 2019-09-15 38 947 M 161K5 158,470 76 1, 2, 3, 4
Freeze 10a, 10b April 2021 2020-11-23 38 1.074 B 184,8785 180,852 81 1, 2, 3, 4, 5, 6

1with genotypes included in the exchange area subset
2Between July and December 2016, additional studies became registered in dbGaP, so the exchange area subset for the phased freeze 3a genotypes includes both more individuals and more sites than the earlier, unphased version of the same genotypes.
3See note in freeze.6a Notes under Exchange Area locations and resources section. 122K includes samples from programs besides TOPMed (e.g. CCDG).
4See note in freeze.8 Notes under Exchange Area locations and resources section. 186K includes samples from programs besides TOPMed (e.g. CCDG).
5Includes TOPMed and 1000 Genomes samples. Does not include samples from other programs (e.g. CCDG, inPsych) used in variant discovery, as variants which were only present among those other programs are not included in the Exchange Area VCFs (freeze 9b and further). However, note that a subset of CCDG samples from studies participating in both TOPMed and CCDG programs are included in TOPMed freezes, as described in this summary of joint-membership studies and samples.
6The cutpoint date for which a WGS freeze is defined. Cinary alignment and map (BAM) date reflects when the IRC's intake quality control process was complete, a few days after BAM receipt date.

See genotype data freeze availability by dbGaP study accession.

Exchange Area Locations and Resources
Structural Variant freeze.1 Exchange Area location or URL

Genotypes calls

Readme files

Combined_Study_Data/Genotypes/structural.variant.freeze.1
sv.freeze.1.readme.pdf
sv.freeze.1.readme.txt
Slides overview (Baylor/UTHealth)

 

WGS freeze.10a, freeze.10b Exchange Area location or URL
Sample annotation TBA

Unphased genotype calls
 

Readme files
 

Combined_Study_Data/Genotypes/freeze.10a
Combined_Study_Data/Genotypes/freeze.10b

freeze.10a.readme.txt
freeze.10b.readme.txt

Phased genotype calls TBA
Slides preview (IRC)
Notes Please be aware that TOPMed WGS freeze.6a and forward include CCDG samples from joint-membership TOPMed-CCDG studies. A summary of these joint-membership studies and samples is available.
WGS freeze.9b Exchange Area location or URL
Sample annotation and duplicates Combined_Study_Data/Genotypes/freeze.9b/sample_annotation † 

Unphased genotype calls

Readme files

Combined_Study_Data/Genotypes/freeze.9b
freeze.9b.readme.txt
freeze.9b.replacement.chrX.readme

Phased genotype calls Combined_Study_Data/Genotypes/freeze.9b/phased
Kinship coefficients and PCs Combined_Study_Data/Genotypes/freeze.9b/relatedness
Methods document freeze.9 sequencing and data processing methods
Slides summary (IRC)
Notes Please be aware that TOPMed WGS freeze.6a and forward include CCDG samples from joint-membership TOPMed-CCDG studies. A summary of these joint-membership studies and samples is available.
WGS freeze.8 Exchange Area location or URL
Sample annotation Combined_Study_Data/Genotypes/freeze.8/sample_annotation † 
Unphased genotype calls Combined_Study_Data/Genotypes/freeze.8
Phased genotype calls Combined_Study_Data/Genotypes/freeze.8/phased

Local ancestry estimates

Readme files

Dec 2020 Virtual F2F presentation

Combined_Study_Data/Genotypes/freeze.8.autosome.local.ancestry
Combined_Study_Data/Genotypes/freeze.8.chrX.local.ancestry
Freeze.8.autosome.local.ancestry.readme.txt
freeze.8.chrX.local.ancestry.readme.txt
slides (IRC)
video (IRC)
Kinship coefficients, PCs, and duplicates Combined_Study_Data/Genotypes/freeze.8/relatedness
Methods document freeze.8 sequencing and data processing methods
Slides sex, relatedness, PCA (DCC)
Notes Please be aware that TOPMed WGS freeze.6a and forward include CCDG samples from joint-membership TOPMed-CCDG studies. A summary of these joint-membership studies and samples is available.
WGS freeze.6a Exchange Area location or URL
Sample annotation Combined_Study_Data/Genotypes/freeze.6a/sample_annotation † 
Unphased genotype calls Combined_Study_Data/Genotypes/freeze.6a
Phased genotype calls Combined_Study_Data/Genotypes/freeze.6a/phased
Local ancestry estimates TBA
Kinship coefficients, PCs, and duplicates Combined_Study_Data/Genotypes/freeze.6a/relatedness
Methods document freeze.6a sequencing and data processing methods
Slides summary (IRC)
sex, relatedness, PCA (DCC)
Notes Please be aware that TOPMed WGS freeze.6a and forward include CCDG samples from joint-membership TOPMed-CCDG studies. A summary of these joint-membership studies and samples is available.
WGS freeze.5b Exchange Area location or URL
Sample annotation Combined_Study_Data/Genotypes/freeze.5b/sample_annotation †
Unphased genotype calls Combined_Study_Data/Genotypes/freeze.5b
Phased genotype calls Combined_Study_Data/Genotypes/freeze.5b/phased.minDP0
Local ancestry estimates Combined_Study_Data/Genotypes/freeze.5b.local.ancestry
Kinship coefficients, PCs, and duplicates Combined_Study_Data/Genotypes/freeze.5b/relatedness
Methods document freeze.5b sequencing and data processing methods
Slides preview (IRC)
overview (IRC)
duplicates analysis (DCC)
sex, relatedness, PCA (DCC)
Notes Approximately 10% of samples from WGS freeze.4 did not complete remapping to build 38 and are not included in WGS freeze.5b.
 
WGS freeze.4 Exchange Area location or URL
Sample annotation Combined_Study_Data/Genotypes/freeze.4/sample_sets_* †
Unphased genotype calls
Kinship coefficients, PCs, and duplicates Combined_Study_Data/Genotypes/freeze.4/relatedness
Methods document freeze.4 sequencing and data processing methods
Slides sex, relatedness, PCA (DCC)
WGS freeze.3a Exchange Area location or URL
Sample annotation Combined_Study_Data/Genotypes/freeze.3a/sample_sets_* †
Unphased genotype calls Combined_Study_Data/Genotypes/freeze.3a
Kinship coefficients, PCs, and duplicates Combined_Study_Data/Genotypes/freeze.3a/relatedness
Methods document freeze.3a sequencing and data processing methods
WGS freeze.2a Exchange Area location or URL
Unphased genotype calls Combined_Study_Data/Genotypes/freeze.2a
WGS freeze.1c Exchange Area location or URL
Unphased genotype calls Combined_Study_Data/Genotypes/freeze.1c
Slides preview (IRC)

† Look for the files with the most recent date.

WGSA Variant Annotation and Related Resources

Please cite (1) below if you use Raw WGSA variant annotations, cite (1) and (2) below if you use parsed WGSA annotations. DCC representatives can be invited as co-authors if you use variant grouping files provided below.
Citations:

  1. WGSA :Liu X, White S, Peng B, Johnson AD, Brody JA, Li AH, Huang Z, Carroll A, Wei P, Gibbs R, Klein RJ and Boerwinkle E. (2016) WGSA: an annotation pipeline for human genome sequencing studies. Journal of Medical Genetics 53:111-112
  2. WGSAparsr: Heavner BD, Jain D. WGSAParsr. https://github.com/UW-GAC/wgsaparsr/
WGS freeze.8 Exchange Area location

Raw WGSA variant annotations

Combined_Study_Data/Genotypes/freeze.8_annotation/WGSA/
WGSA parsed annotations Combined_Study_Data/Genotypes/freeze.8_annotation/WGSA_parsed/provisional‡

Variant grouping files for aggregate tests

Combined_Study_Data/Genotypes/freeze.8_annotation/var_grouping
WGS freeze.6 Exchange Area location

Raw WGSA variant annotations

Combined_Study_Data/Genotypes/freeze.6_annotation/WGSA

WGSA parsed annotations

Combined_Study_Data/Genotypes/freeze.6_annotation/WGSA_parsed/provisional‡

Variant grouping files for aggregate tests

Combined_Study_Data/Genotypes/freeze.6_annotation/var_grouping
WGS freeze.5 Exchange Area location

Raw WGSA variant annotations

Combined_Study_Data/Genotypes/freeze.5_annotation/WGSA

WGSA parsed annotations

Combined_Study_Data/Genotypes/freeze.5_annotation/WGSA/WGSA_parsed‡

Variant grouping files for aggregate tests

Combined_Study_Data/Genotypes/freeze.5_annotation/WGSA/var_grouping
WGS freeze.2a, freeze.3a, freeze.4 Exchange Area location

Raw WGSA variant annotations

Combined_Study_Data/Genotypes/freeze.3a_annotation/freezes_2a_3a_4_annot

WGSA parsed annotations

Will not be generated

Variant grouping files for aggregate tests

Will not be generated

‡ Look for the files with the most recent version.

WGS Read Alignment Data/CRAMs

WGS Read alignment data/CRAMs

A limited number of TOPMed Phase 1 CRAMs aligned to build 37 are available directly through the dbGaP Sequence Read Archive (SRA). These are accessible via their corresponding TOPMed accessions with dbGaP approval. All other CRAMs, including build 38 alignments for all TOPMed WGS samples, are hosted in NHLBI cloud buckets and accessed using the “Fusera” software.

Instructions for controlled access to TOPMed sequence data on the cloud 

Further documentation on dbGaP cloud access

A mapping of TOPMed NWD IDs to SRR IDs for build 38-aligned CRAMs is available on the exchange area: Combined_Study_Data/SRA_ID_mapping/
Files are listed by date and may not reflect all CRAMs available after the listed date. 

Variant Summary Data and Imputation Reference Panel

Variant Summary Data

The following repositories contain variant summary information for TOPMed studies that granted explicit permission. These resources are publicly available.

Imputation Reference Panel and Server

A TOPMed reference panel and imputation server is available through NHLBI BioData Catalyst®. The resource is available to the scientific community at https://imputation.biodatacatalyst.nhlbi.nih.gov/.

Back to top