Skip to main content
TOPMed

Harmonized Phenotypes

TOPMed Phenotype Harmonization Project

The main goal of the TOPMed harmonization project is to provide harmonized phenotypes that are well-documented, reproducible, and homogeneous across studies. In harmonized datasets and documentation, “phenotype” refers to the observable characteristic (e.g., diastolic blood pressure) and “variable” to refer to the specific data vector values for a given phenotype (e.g., bp_diastolic_1). To enable reproducibility, all study data were acquired from dbGaP.

Datasets and documentation of the harmonized variables were submitted to two repositories: dbGaP and BioData Catalyst. Full documentation for each harmonized variable is provided in a GitHub repository. The documentation for each harmonized variable includes the identifiers of the original dbGaP study variables used in harmonization as well as the code that was used to transform them into the harmonized variable. This repository also includes a reproducible example that instructs users how to use the documentation to reproduce a simulated harmonized variable.

TOPMed Phenotype Tagging Project

Over 16,000 dbGaP study variables with 65 phenotype concepts from heart, lung, blood, and sleep domains were tagged. These tags enable researchers to identify variables of interest that can be used in future harmonization efforts.  The results of the tagging project are available in the dbGaP user interface.  All tags are mapped to a UMLS Concept Unique Identifier (CUI), which is required for identifying the tagged variables on dbGaP.  

Instructions for Identifying Tagged Variables on dbGaP

The following are examples of different methods to search for tagged variables: Entrez search and faceted search.

Entrez search

  • In your web browser, visit the dbGaP Entrez advanced search page.
  • In the search builder, select Common Data Element Resource and enter “umls” into the associated text box or add “umls[Common Data Element Resource]”.  Another option is to select Common Data Element Term and enter the CUI of a UMLS term into the associated text box or add “C0005890[Common Data Element Term]” to the search box.    
  • The Studies tab of the search results displays all of the studies that contain tagged variables.
  • The Variables tab of the search results displays all of the dbGaP variables that are tagged with at least one UMLS term. Click on a variable name to see more information on the variable page

Faceted search

  • In your web browser, visit the dbGaP faceted search page.• Click on the Variables tab.
  • Under the Common Data Elements filter, check UMLS.o This will display all of the dbGaP study variables that are tagged with a UMLS term.
  • For a given variable listed on the right, you can click on the UMLS link to go directly to the variable’s information page with the full UMLS term name.
  • To search for variables tagged with a specific UMLS term, search for the term’s CUI in the search box in the upper left corner of the page.

Mapped Phenotype Tags

Mapped Phenotype Tags
Phenotype Domain Description UMLS CUI UMLS Term Tag Name (phenotype concept)
Hematology & Hemostasis Quantitative measure of platelet cell number per volume of blood C1287267 Finding of platelet count Platelet count
Hematology & Hemostasis Quantitative measure of red blood cell number per volume of blood C1287262 Finding of red blood cell count Red blood cell count
Hematology & Hemostasis Quantitative measure of von Willebrand factor (vWf) activity or concentration in blood C0427585 vWF - von Willebrand factor level result von Willebrand factor
Hematology & Hemostasis Qualitative indicator of venous thromboembolism (VTE) status C1861172 venous thromboembolism VTE
Hematology & Hemostasis Quantitative measure of white blood cell number per volume of blood C0427512 White blood cell count laboratory result White blood cell count
Inflammation Quantitative measure of C-reactive protein (CRP) concentration in blood C0428528 C-reactive protein level CRP in blood
Inflammation Quantitative measure of Interleukin 6 (IL-6) concentration in blood C0366888 Interleukin6:ACnc:Pt:Ser/Plas:Qn Interleukin 6 in blood
EKG/Arrhythmia Qualitative indicator of atrial fibrillation or atrial flutter status C0155709 Atrial fibrillation and flutter Atrial fibrillation/flutter
EKG/Arrhythmia Quantitative index of left ventricular hypertrophy (LVT) calculated from electrocardiogram (EKG) data C0232306 Electrocardiogram: left ventricle hypertrophy (finding) LVH from EKG
EKG/Arrhythmia Qualitative indicator of pacemaker implant status C1533090 Pacemaker observable Pacemaker

Citation

Information about these projects is available in a published manuscript. If you use the datasets described on this page, please cite the following paper:

Stilp AM, Emery LS, Broome JG, Buth EJ, Khan AT, Laurie CA, Wang FF, Wong Q, Chen D, D’Augustine CM, Heard-Costa NL, Hohensee CR, Johnson WC, Juarez LD, Liu J, Mutalik KM, Raffield LM, Wiggins KL, de Vries PS, Kelly TN, Kooperberg C, Natarajan P, Peloso GM, Peyser PA, Reiner AP, Arnett DK, Aslibekyan S, Barnes KC, Bielak LF, Bis JC, Cade BE, Chen MH, Correa A, Cupples LA, de Andrade M, Ellinor PT, Fornage M, Franceschini N, Gan W, Ganesh SK, Graffelman J, Grove ML, Guo X, Hawley NL, Hsu WL, Jackson RD, Jaquish CE, Johnson AD, Kardia SLR, Kelly S, Lee J, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Morrison AC, North KE, Nouraie SM, Oelsner EC, Pankratz N, Rich SS, Rotter JI, Smith JA, Taylor KD, Vasan RS, Weeks DE, Weiss ST, Wilson CG, Yanek LR, Psaty BM, Heckbert SR, Laurie CC. A System for Phenotype Harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) Program. Am J Epidemiol. 2021 Oct 1;190(10):1977-1992. doi: 10.1093/aje/kwab115. PMID: 33861317; PMCID: PMC8485147.

Back to top