Skip to main content

Hematology and Hemostasis

A mixed model approach to testing multiple correlated traits in large samples: An application to the Trans-Omics for Precision Medicine (TOPMed) program hematology phenotypes.

Authors
Matthew Conomos, Stephanie Gogarten, Alex Reiner, BioData Catalyst Consortium, TOPMed Hematology and Hemostasis Working Group
Name and Date of Professional Meeting
ASHG Conference October 27, 2020
Associated paper proposal(s)
Working Group(s)
Abstract Text
Large scale consortia amass numerous phenotypes, many of which are correlated. Compared to testing phenotypes independently, testing correlated phenotypes for association with genetic variants simultaneously yields higher power and the ability to identify pleiotropy. Additionally, genetic studies with numerous participants have population structure and relatedness. Thus, we need efficient models to test multiple, correlated phenotypes while accurately modeling sample correlation. Methods have been developed for genetic testing with multiple phenotypes, but limitations include allowing only one variance component in the model. Along with relatedness, it may be appropriate to include additional variance components to model, for example, a shared environment or household, or a specific relatedness matrix estimated from X chromosome genetic markers. We will extend existing linear mixed model algorithms for multiple phenotype association testing to allow for more than one variance component. Further, implementing a multi-phenotype test oftentimes requires using a different software package than that used for a single phenotype association test. The multivariate mixed model will be implemented in the GENESIS software package. GENESIS is an established suite of genetic analysis functions, and the association testing functions allow for heterogeneous variance and use of sparse matrices for large sample sizes. We will also create a workflow for the BioData Catalyst platform powered by Seven Bridges, which will be available to all users, enabling a seamless interface from single variant association testing to multiple phenotype testing. In the BioData Catalyst ecosystem, a user is able to execute existing workflows within a high performance cloud computing environment, removing the need for data transfer between collaborators or access to an on-site computing cluster. The utility of the multiple phenotype association method will be demonstrated in whole genome sequence data from the NHLBI TOPMed Program, which is a large, multi-ethnic sample set. Multiple trait association testing will be performed in participants from 13 studies that have seven red blood cell traits measured: hemoglobin, hematocrit, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, red blood cell count, and red cell distribution width. The multiple phenotype association results will be compared to the results from testing each phenotype individually to identify pleiotropy and any novel associations.

eSCAN: Scan regulatory regions for aggregate association testing using whole genome sequencing data

Authors
Yingxi Yang, Yuchen Yang, Le Huang, Adolfo Correa, Alexander Reiner, Laura Raffield, Yun Li
Name and Date of Professional Meeting
American Society of Human Genetics, October 27-31, 2020
Associated paper proposal(s)
Working Group(s)
Abstract Text
New technologies (such as promoter capture Hi-C) have helped elucidate the three-dimensional structure of the genome and can help link enhancer regions to the genes they affect. Leveraging this data may help improve understanding of the impact of low frequency and rare noncoding genetic variation on phenotypes of interest and link this variation to its target gene(s). In current whole genome sequencing analyses, a variety of genome annotation information (including chromatin conformation data, histone marks, and measures of open chromatin across cell types, for example) is often used to predefine variant sets for aggregate testing. Other existing approaches such as SCANG, with dynamic selection of analysis windows, are less reliant on existing knowledge of the appropriate variant sets to test and have increased power versus standard sliding window approaches. However, most identified regions from these approaches cross several enhancer regions and are not clearly linked to genes, limiting biological understanding, and problems with false-positives remain. We here propose the eSCAN (or “Scan the Enhancers”, with “enhancers” as a shorthand for any potential regulatory regions in the genome) method for genome-wide assessment of regulatory regions in sequencing studies, combining the advantages of dynamic window selection with the advantages of integrating various types of functional information, including chromatin accessibility, histone markers, and 3D chromatin conformation. eSCAN searches windows which mark putative regulatory regions, addressing the cross-boundary issue of methods like SCANG and lowering the false positive rate, as well as further increasing power, which is demonstrated by simulation studies under a wide range of scenarios. We applied eSCAN to the analysis of blood cell traits from Jackson Heart Study (JHS) and Women’s Health Initiative (WHI) study, identifying regulatory regions which reside in known signals, including CXCR2 for white blood cell count and EPO for hematocrit and hemoglobin.
Back to top