Inferring Latent Tumor Cell Subpopulations with Latent Feature Allocation Models

Tianjian Zhou (November 4, 2019)

Tumor cell population consists of genetically heterogeneous subpopulations (subclones), with each subpopulation being characterized by overlapping sets of single nucleotide variants (SNVs). Bulk sequencing data using high-throughput sequencing technology provide short reads mapped to many nucleotide loci as a mixture of signals from different subclones. Based on such data, we infer tumor subclones using latent feature allocation models. Specifically, we estimate the number of subclones, their genotypes, cellular proportions and the phylogenetic tree spanned by the inferred subclones. Prior probabilities are assigned to these latent quantities, and posterior inference is implemented through Markov chain Monte Carlo simulations. A key innovation in our method, TreeClone, is to model short reads mapped to pairs of proximal SNVs, which we refer to as mutation pairs. The performance of our method is assessed using simulated and real datasets with single and multiple tumor samples.