Skip to main content

Hematology and Hemostasis

Telomere length estimation and analysis on large scale whole-genome sequencing data

Authors
Margaret A. Taub, Kruthika Iyer, Joshua Weinstock, Lisa R. Yanek, Hyun M. Kang, Tom Blackwell, Adolfo Correa, Steven L. Salzberg, Dhananjay Vaidya, Diane M. Becker, James G. Wilson, Lewis C. Becker, Gonçalo Abecasis, Alex Reiner, Rasika A. Mathias
Name and Date of Professional Meeting
ASHG Conference (Oct 18, 2017)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Telomere length (TL) is a hallmark of aging, and has been associated with a range of diseases. Variation in TL is linked to exposures such as smoking and stress, and to genetic factors. TL is also considered a measure of “biological age”, a concept that disengages the calendar age of an individual from their physical age. While some prior studies have assessed these relationships, the advent of large-scale whole-genome sequencing (WGS) studies in diverse populations has created new resources to expand knowledge of the role of TL in aging and disease.

The optimal method for estimating TL from WGS data is still unknown. In a pilot analysis of 574 samples from the GeneSTAR study, we compared three methods (TelSeq, Computel, and a hexamer counting method), and observed strong correlation among methods (0.94 and above within a batch) but wide differences in computational efficiency (ranging from a few hours to a few days to process one sample, on one core). To date, only small-scale comparisons between WGS-based estimates and low-throughput estimates, such as flow-FISH or qPCR have been done. Preliminary analysis of 19 samples with flow-FISH estimates from two cell types (lymphocytes and granulocytes) show consistent correlation between 0.7 and 0.8 with all methods considered, indicating that computational efficiency is likely to be a more important consideration as all methods perform otherwise similarly to one another.

Samples from the pilot were processed in two batches using two different sequencing centers, to similar depth (~30x). We saw striking differences in the distribution of TL estimates from these batches, with TelSeq estimates on the first batch of samples showing a median of around 5kb (interquartile range 4.2kb to 5.5kb), and the second batch of samples showing a median of around 2.8kb (IQR 2.5kb to 3.1kb). For large scale sequencing studies, with samples likely to be processed in batches, concerns about the ability to combine data generated separately are clearly warranted for these TL estimates.

Ongoing work involves estimation on an additional 2500 samples from the Jackson Heart Study through the TOPMed program, for which Southern Blot TL estimates calculated on blood from the same collection point are available for comparison. This will aid in determining the optimal TL estimation method and investigating how to detect and adjust for batch effects to enable effective analyses of the influence of TL on health outcomes.
Back to top