Practical recommendations for TOPMed metabolomics data

Submitted by	Ockerman, Franklin
Authors	Franklin Ockerman, Laura Zhou, Emily Drzymalla, Taryn Alkis, Megan Grove, Bing Yu, Laura Raffield
Name and Date of Professional Meeting	American Society of Human Genetics Annual Meeting (November 1-5, 2023)
Associated paper proposal(s)	A Pilot Cross-Cohort Metabolomics Proposal for TOPMed Metabolomics Data: Metabolites Associated with Sex and Age
Working Group(s)	Metabolomics and Proteomics
Abstract Text	The Trans-Omics for Precision Medicine (TOPMed) program expects to soon release over 90,000 samples with broad-spectrum metabolomic data, representing over a dozen studies. However, investigators using this resource face potential challenges in pre-processing and integrating data across studies. Differing metabolomic platforms and analysis centers may cause technical variation. Likewise, missing metabolite values may vary in their distribution and source between studies. Consistent protocols for pre-processing and integration are thus necessary to unlock the potential of this rich resource. We compare several strategies and offer recommendations for the TOPMed community, with the goal of guiding and facilitating future genetic and phenotype-specific analyses. As a pilot phase, we are currently analyzing data from 25,058 participants from diverse case-control and population-based cohort studies, including 15,633 participants from 3 cohort studies on the Metabolon platform and 9,425 participants from 5 cohort studies on the Broad/BIDMC platform. This dataset includes 1730 named metabolites, including 364 metabolites measured in at least some cohorts across both platforms. With within-study rank-based inverse normal transformation, we demonstrate that estimates of age-metabolite associations are highly concordant (r > 0.999), and generally consistent with the existing literature, between pooled and inverse variance meta-analyzed data, although 36 metabolites are significant only in the meta-analysis. Most named metabolites had very low missingness in our dataset, and we found that metabolite associations with age and sex were highly consistent across all missingness imputation strategies (zero, min, half-min, k-nearest neighbors, random forest, quantile regression imputation of left censored data). We recommend replacing missing values with zero in metabolites characterized as xenobiotics. For other metabolites, we will compare imputation strategies with an analysis of metabolite quantitative trait loci (mQTLs). In summary, we find largely consistent results in pooled and inverse variance meta-analysis. We recommend inverse-normal transformation to enable integration between studies. We recommend left-censored imputation for xenobiotics and will soon release recommendations for imputation in other metabolites. To aid investigators, we will release scripts for implementing these recommendations. Such pre-processing steps are necessary to optimize power in cross cohort metabolomic analysis, including planned QTL studies.