Skip to main content

Analysis

Multi-omic association studies identify novel genes and proteins regulating cellular sensitivity to chemotherapy in diverse populations

Authors
Ashley Mulford, Claudia Wing, Ryan Schubert, TopMed Consortium, Ani Manichaikul, Hae Kyung Im, M. Eileen Dolan, Heather E. Wheeler
Name and Date of Professional Meeting
Great Lakes Bioinformatics Meeting; May 10-13, 2021
Associated paper proposal(s)
Working Group(s)
Abstract Text
The development of effective treatments is vital in the fight against cancer, the second leading cause of death globally. Most cancer chemotherapeutic agents are ineffective in a subset of patients; thus, it is important to consider the role of genetic variation in drug response. One useful model to determine how genetic variation contributes to differing drug cytotoxicity is HapMap lymphoblastoid cell lines (LCLs).
In our study, LCLs from 1000 Genomes Project populations of diverse ancestries were previously treated with increasing concentrations of eight chemotherapeutic drugs: cytarabine arabinoside, capecitabine, carboplatin, cisplatin, daunorubicin, etoposide, paclitaxel, and pemetrexed. Cell growth inhibition was measured at each dose after 72 hours of exposure with either half-maximal inhibitory concentration (IC50) or area under the dose-response curve (AUC) as our phenotype for each drug; all phenotypic data were rank-normalized for use in subsequent analyses. Depending on drug, populations analyzed included up to 168, 177, or 90 individuals with European (CEU), Yoruba (YRI), or East Asian (ASN) ancestries, respectively. Including diverse populations is vital to advancing our understanding of the factors impacting the effectiveness of treatments, as some variants are unique to specific ancestral populations, and some ancestral populations, particularly those of African ancestries, contain greater genetic variation than more widely studied populations of European ancestries.
We performed genome- and transcriptome-wide association studies (GWAS/TWAS) and protein-based association studies (PAS) within each population and in all three populations combined (ALL). We conducted GWAS using GEMMA, a software toolkit for fast application of linear mixed models (LMMs) that accounts for relatedness among individuals, because the CEU and YRI ancestral populations contain parent-child trios. Additionally, we performed genotypic principal component analysis to account for population stratification within the ALL population. We conducted TWAS and PAS using PrediXcan and GEMMA. We used PrediXcan, which utilizes prediction models, to calculate predicted gene expression and protein levels based on genotypic data. We then used GEMMA to identify associations between the predicted levels derived by PrediXcan and chemotherapy-induced cytotoxicity. When conducting TWAS, we used the previously trained tissue-based GTEx (Genotype-Tissue Expression) Project version 7 and population-based MESA (Multi-ethnic Study of Atherosclerosis) prediction models available in PredictDB.
In order to conduct PAS, we trained population-based prediction models using genotype and plasma protein data from an aptamer-based assay of 1335 proteins from individuals of African (AFA, n=183), European (EUR, n=416), Chinese (CHN, n=71), and Hispanic/Latino (HIS, n=301) ancestries in the TOPMed (Trans-omics for Precision Medicine) MESA multi-omics pilot study. We used cross-validated elastic net regularization (alpha mixing parameter=0.5) with genetic variants within 1Mb of the gene encoding each protein as predictors for protein levels. We carried protein models with Spearman correlation > 0.1 between predicted and observed levels forward to our PAS. Thus, depending on population, we tested between 253 and 416 proteins across all models for association with chemotherapy induced cytotoxicity.
Through these multi-omics association studies, we identified twelve SNPs, two genes, and seven proteins significantly associated with cellular sensitivity to chemotherapeutic drugs within and across diverse populations after Bonferroni correction for multiple testing. The TWAS we performed found that increased STARD5 predicted gene expression associates with decreased etoposide IC50 in ALL (P=8.49e-08). Functional studies in A549, a lung cancer cell line, revealed that knockdown of STARD5 expression results in decreased sensitivity to etoposide following exposure for 72 (P=0.033) and 96 hours (P=0.0001). By identifying variants, transcripts, and proteins associated with cytotoxicity across diverse ancestral populations, we strive to understand the various factors impacting the effectiveness of chemotherapy drugs and contribute to the development of future precision cancer treatment.

A generalizable protocol for leveraging whole genome sequenced cohorts as population controls for new genome wide association studies

Authors
Ravi Mathur, Fang Fang, Nathan Gaddis, Dana B. Hancock, Edwin K. Silverman, Michael H. Cho, Laura Bierut, Sharon Lutz, John E. Hokanson, Grier P. Page, Eric O. Johnson
Name and Date of Professional Meeting
ASHG Annual Meeting (October 27-30, 2020)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Data sharing and reuse is a key component of modern genetics research to enhance discovery. In one data reuse approach, population controls are matched to case-only data, allowing for new genome-wide association studies (GWAS). We previously developed an unbiased GWAS approach that matches case-only cohorts with population controls, but it depends on having substantial overlap in array-based genotyped variants (e.g., Affymetrix and Illumina array data have little overlap in variants) and poses a limitation especially for minority ancestries with less genotyped data available. This limitation may be addressed using whole-genome sequencing (WGS) data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) initiative.
We developed a practical protocol, involving rigorous quality control, phasing, imputation, and filtering to appropriately integrate WGS and array genotyping data. We tested our protocol using European- and African-American participants from TOPMed’s Genetic Epidemiology of Chronic Obstructive Pulmonary Disease (COPDGene) and Collaborative Genetic Study of Nicotine Dependence (COGEND) cohorts. First, we applied our protocol to test for bias by genotyping technology. Comparing the array vs. WGS genotyping data from the same COPDGene participants, we found highly consistent results that produced no false positive signals. Second, we confirmed valid results under the null hypothesis with an independent cigarette smoking cohort, COGEND, combined with the cigarette smoking COPDGene cohort. Association analysis was conducted with array genotyping data in COGEND as cases vs. WGS data in COPDGene as controls. Since each cohort is composed of cigarette smokers, we observed no genome-wide significant results (i.e., no false positives). Third, we tested confirmation of true positive associations by comparing COPD-diagnosed cases with array genotyping data vs. COPDGene controls with WGS data. Our protocol successfully confirmed known genome-wide significant loci for COPD (FAM13A and HHIP-AS1, CHRNA3, and CYP2A6 on chromosomes 4, 15, and 19, respectively). The protocol also controlled batch and lab effects (i.e., sequencing phase and center) that have been observed within TOPMed.
Our protocol, developed for combining WGS and array genotyping data, is valid and provides robust results for European- and African-American populations. The protocol enables resourceful use of existing data including TOPMed WGS data, reduces new sample collection requirements and genotyping costs, and enables new ancestry-specific GWAS for case-only studies where limited genotype data exist.

Multi-ethnic fine mapping optimizes proteome association studies

Authors
Ryan Schubert, Ashley Mulford, Tuuli Lappalainen, Anya Mikhaylova, Timothy Thornton, Chris Gignoux, Stephen Montgomery, Leslie Lange, Ethan Lange, Stephen S. Rich, Jerome I. Rotter, Robert Gerszten, Michael Cho, Ani Manichaikul, Hae Kyung Im, Heather E. Wheeler; on behalf of the NHLBI TOPMed Consortium
Name and Date of Professional Meeting
American Society of Human Genetics (October 27-31, 2020)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Leveraging transcriptomic information in association studies has helped elucidate the biological mechanisms underlying complex traits. However, genetically regulated transcriptome variation may not fully capture genetically regulated proteome variation. In addition, while some progress has been made in increasing transcriptome data across populations, increased diversity in human genetics is needed. Here, we used the Trans-omics for Precision Medicine (TOPMed) pilot study, which comprises data from the Multi-ethnic Study of Atherosclerosis (MESA) cohort, to optimize genetic predictors of the plasma proteome for genetically regulated proteome association studies.

Genotype and plasma proteome data from an aptamer-based assay of 1335 proteins from TOPMed MESA consisted of individuals of African (AFA, n=183), European (CAU, n=416), Chinese (CHN, n=71), and Hispanic (HIS, n=301) genetic ancestries. Using SNPs within 1Mb of the gene encoding each protein, we performed fine-mapping using DAP-G within each population and the combined multi-ethnic population (ALL, n=971). We then trained protein prediction models using cross-validated elastic net including only SNPs with non-negligible posterior inclusion probabilities. We tested a range of PIP thresholds for SNP inclusion. Fine-mapping (PIP>0.001) produced more significant (R>0.1) models than baseline (using all SNPs): 579 vs. 199 proteins in AFA, 470 vs. 300 in ALL, 548 vs. 292 in CAU, 307 vs. 124 in CHN, and 602 vs. 234 in HIS. Importantly, fine-mapped models do not perform significantly worse than baseline models when tested between populations.

We then surveyed existing prediction models of RNA expression from PredictDB to estimate what percentage of protein models cover previously unmodeled genes. Depending on population, 71-79% of proteins do not have significant RNA prediction models in the MESA monocyte transcriptome. In PredictDB whole blood mashr models, which include shared tissue effects across GTEx v8, 34% of ALL protein models are new with a mean R=0.36 (max=0.84).

When we tested predicted protein levels for association with lung function traits using publicly available GWAS summary statistics in an S-PrediXcan framework, we detected 43% more Bonferroni significant associations when using the fine-mapped models compared to baseline trained in CAU and 66% more associations when using models trained in ALL. Thus, fine mapping in multi-ethnic cohorts prior to protein prediction model building pre-selects the likely causal SNPs, improving power to detect associations with complex traits and new proteins not modeled in transcriptome studies may reveal new underlying biology.
Back to top