Skip to main content
TOPMed

Harmonized Phenotypes

TOPMed Phenotype Harmonization Project

The main goal of the TOPMed harmonization project is to provide harmonized phenotypes that are well-documented, reproducible, and homogeneous across studies. In harmonized datasets and documentation, “phenotype” refers to the observable characteristic (e.g., diastolic blood pressure) and “variable” to refer to the specific data vector values for a given phenotype (e.g., bp_diastolic_1). To enable reproducibility, all study data were acquired from dbGaP.

Datasets and documentation of the harmonized variables were submitted to two repositories: dbGaP and BioData Catalyst. Full documentation for each harmonized variable is provided in a GitHub repository. The documentation for each harmonized variable includes the identifiers of the original dbGaP study variables used in harmonization as well as the code that was used to transform them into the harmonized variable. This repository also includes a reproducible example that instructs users how to use the documentation to reproduce a simulated harmonized variable.

TOPMed Phenotype Tagging Project

Over 16,000 dbGaP study variables with 65 phenotype concepts from heart, lung, blood, and sleep domains were tagged. These tags enable researchers to identify variables of interest that can be used in future harmonization efforts.  The results of the tagging project are available in the dbGaP user interface.  All tags are mapped to a UMLS Concept Unique Identifier (CUI), which is required for identifying the tagged variables on dbGaP.  

Instructions for Identifying Tagged Variables on dbGaP

The following are examples of different methods to search for tagged variables: Entrez search and faceted search.

Entrez search

  • In your web browser, visit the dbGaP Entrez advanced search page.
  • In the search builder, select Common Data Element Resource and enter “umls” into the associated text box or add “umls[Common Data Element Resource]”.  Another option is to select Common Data Element Term and enter the CUI of a UMLS term into the associated text box or add “C0005890[Common Data Element Term]” to the search box.    
  • The Studies tab of the search results displays all of the studies that contain tagged variables.
  • The Variables tab of the search results displays all of the dbGaP variables that are tagged with at least one UMLS term. Click on a variable name to see more information on the variable page

Faceted search

  • In your web browser, visit the dbGaP faceted search page.• Click on the Variables tab.
  • Under the Common Data Elements filter, check UMLS.o This will display all of the dbGaP study variables that are tagged with a UMLS term.
  • For a given variable listed on the right, you can click on the UMLS link to go directly to the variable’s information page with the full UMLS term name.
  • To search for variables tagged with a specific UMLS term, search for the term’s CUI in the search box in the upper left corner of the page.

Mapped Phenotype Tags

Mapped Phenotype Tags
Phenotype Domain Description UMLS CUI UMLS Term Tag Name (phenotype concept)
Blood pressure Qualitative indicator of hypertension status C3843080 Hypertension or high blood pressure Hypertension
Blood pressure Quantitative measure of resting diastolic blood pressure C2183311 Diastolic blood pressure at rest Resting arm diastolic BP
Blood pressure Quantitative measure of resting systolic blood pressure C2039694 Systolic blood pressure at rest Resting arm systolic BP
Diabetes Quantitative measure of the concentration of glucose in blood. C1320980 Blood glucose status Blood glucose
Diabetes Qualitative indicator of diabetes mellitus status C0011849 Diabetes Mellitus Diabetes
Diabetes Quantitative measure of the concentration of glycated hemoglobin (hemoglobin A1c, or HbA1c) in blood C1261236 Hemoglobin A1c level result HbA1c
Diabetes Quantitative measure of the concentration of insulin in blood. C0428405 Insulin level result Insulin in blood
Anthropometry Body mass index (weight divided by the square of height) C1305855 Body mass index BMI
Anthropometry Standing body height measurement C0005890 Body height Height
Anthropometry Hip circumference measurement C0562350 Hip circumference Hip circumference

Citation

Information about these projects is available in a published manuscript. If you use the datasets described on this page, please cite the following paper:

Stilp AM, Emery LS, Broome JG, Buth EJ, Khan AT, Laurie CA, Wang FF, Wong Q, Chen D, D’Augustine CM, Heard-Costa NL, Hohensee CR, Johnson WC, Juarez LD, Liu J, Mutalik KM, Raffield LM, Wiggins KL, de Vries PS, Kelly TN, Kooperberg C, Natarajan P, Peloso GM, Peyser PA, Reiner AP, Arnett DK, Aslibekyan S, Barnes KC, Bielak LF, Bis JC, Cade BE, Chen MH, Correa A, Cupples LA, de Andrade M, Ellinor PT, Fornage M, Franceschini N, Gan W, Ganesh SK, Graffelman J, Grove ML, Guo X, Hawley NL, Hsu WL, Jackson RD, Jaquish CE, Johnson AD, Kardia SLR, Kelly S, Lee J, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Morrison AC, North KE, Nouraie SM, Oelsner EC, Pankratz N, Rich SS, Rotter JI, Smith JA, Taylor KD, Vasan RS, Weeks DE, Weiss ST, Wilson CG, Yanek LR, Psaty BM, Heckbert SR, Laurie CC. A System for Phenotype Harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) Program. Am J Epidemiol. 2021 Oct 1;190(10):1977-1992. doi: 10.1093/aje/kwab115. PMID: 33861317; PMCID: PMC8485147.

Back to top