Workshop 2: Models for Oncogenesis, Clonality and Tumor Progression
Telomere crisis occurs during tumorigenesis when depletion of the telomere reserve leads to frequent telomere fusions. The resulting dicentric chromosomes have been proposed to drive genome instability. Here we examine the fate of dicentric human chromosomes during telomere crisis. We observed that dicentric chromosomes invariably persisted through mitosis and developed into 50-200 Î¼m chromatin bridges connecting the daughter cells. Before their resolution at 3-20 h after anaphase, the chromatin bridges induced nuclear envelope rupture in interphase, accumulated the cytoplasmic 3' nuclease TREX1, and developed RPA-coated single stranded (ss) DNA. CRISPR knockouts showed that TREX1 contributed to the generation of the ssDNA and the resolution of the chromatin bridges. Post-crisis clones showed chromothripsis and kataegis, presumably resulting from DNA repair and APOBEC editing of the fragmented chromatin bridge DNA. We propose that chromothripsis in human cancer may arise through TREX1-mediated fragmentation of dicentric chromosomes formed in telomere crisis.
A role for somatic mutations in carcinogenesis is well accepted, but the degree to which mutation rates influence cancer initiation and development is under continuous debate. Recently accumulated genomic data has revealed that thousands of tumor samples are riddled by hypermutation, broadening support that cancers acquire a mutator phenotype. This major expansion of cancer mutation data sets has provided unprecedented statistical power for the analysis of mutation spectra, which has confirmed several classical sources of mutation in cancer, highlighted new prominent mutation sources and empowered the search for cancer drivers. The philosophy and statistical approaches for extracting useful information from catalogues of mutations in cancer genomes are overall analogous to the analysis of mutation spectra obtained in experiments with mutation reporters â€“ the classical approach in molecular genetics. Apparent â€œirregularitiesâ€? in distribution of mutation types and position as compared to the null hypothesis of random mutation spectrum are matched against mechanistic knowledge about the chemistry of a mutagenic factor and genetic systems expected to repair the resulting DNA lesions. The confluence of agnostic signature deconvolution and knowledge-based analysis capitalizing on mechanistic insight provides great promise for understanding the basic development of cancer through mutations. I will present the example of such a merger developed in the course of analysis of APOBEC-signature mutagenesis in cancers, which aided in identification of mutagenic enzymes, and highlighted important correlation with clinical features as well as sample-specific mechanistic details of mutagenesis.
Tumours are constituted of a heterogeneous mixture of cells, which are subject to selective pressures leading to the evolution of cell populations over time. By applying a variety of methods to multiomic data, we model the effect of tumour evolutionary processes in terms of the multimodal molecular signature that they leave. This signature is composed of a combination of different types of aberration, including copy number, point mutation and methylation changes. Using examples from a range of tumour types including prostate, oesophageal, breast and haematopoietic cancers, I will describe the findings that may be obtained from 2 different types of study: deep studies involving multiomic, multiple-sample data from a small number (10s) of tumours and studies involving relatively shallow analysis of single samples from a large number (1000s) of tumours.
This presentation will discuss efforts to use mathematical modeling to understand the metabolism of tumors. First I will discuss efforts using computational modeling and metabolomics to understand the structure and function of glucose metabolism in cells. I will focus on a phenotype known as the Warburg Effect. The Warburg Effect (WE) is characterized by the increased metabolism of glucose to lactate. It is a common feature of cancers and proliferative diseases but its functions and differences from normal oxidative metabolism are not completely understood. Next I will focus on a network known as one carbon metabolism that integrates nutritional status from multiple sources including glucose to generate multiple biological outputs including anabolic and redox metabolism. This network provides the substrates for methyl groups that mediate the epigenetic status of cells. I will provide evidence that variation in the basal activity of one carbon metabolism is necessary and sufficient to determine methylation status of key epigenetic marks on histones. This finding provides a link between nutrient status and chromatin biology.
In our quest for curing disease and improving the quality of human lives, we strive to understand what causes biological systems to break and how we can repair them. Large-scale biological networks are essential to this process as they map functional connections between most genes in the genome and can potentially uncover high level organizing principles governing cellular functions. To investigate network organization in normal and diseased conditions, I developed a systematic quantitative approach that measures the distribution of functional attributes across a network and maps their relative positioning. Using this approach on the global genetic and protein-protein interaction networks in yeast, I showed that genes acting in the same pathway, complex or biological process form coherent functional clusters, which localize close to one other according to their shared functionality and organize hierarchically. Furthermore, this approach provided a unique perspective onto the reproducibility of large-scale chemical genomic screens in yeast and revealed a novel link between vesicle-mediate transport and resistance to the anti-cancer drug bortezomib.
It is now established that complex biological phenotypes are not governed by single genes but instead by networks of interacting genes and gene products. As a consequence, deciphering the structure of the gene regulatory network (GRN) is crucial to further our understanding of fundamental processes in human cells. However, the mapping of molecular interactions in the intracellular realm remains a major bottleneck in the pipeline to produce biological knowledge from high-throughput biological data.
Multiple methods exist to infer undirected large-scale regulatory networks from collections of transcriptomic data. However very few network inference methods can infer the directionality of predicted gene interactions, despite this being key in the process of better interpreting GRNs. Another challenge when inferring large-scale GRNs consists in quantitatively assessing their validity. Popular, however weak, validation procedures include (i) simulation; (ii) using incomplete â€˜gold standardâ€™ datasets, such as known transcription factors and their targets, which only partially recapitulate the interactions that can be inferred from transcriptomic data; and (iii) using low-throughput laboratory experiments to validate a few predicted interactions, which represent only a very small and potentially biased part of the inferred GRN.
To address these issues, we have developed mRMRe, an ensemble approach for network and causality inference, and their integration of priors. We applied our new method on a large collection of nearly 500,000 shRNA experiments with gene expression profiles of cancer cell lines before and after knockdown of 3500 genes. This unique dataset allowed us to infer a regulatory networks for 978 landmark genes in multiple cell types, and quantitatively assess their quality. Our results suggest that the complexity of the underlying biology, and the noise present in the shRNA experiments and gene expression profiling make it very challenging to infer meaningful gene-gene interactions. Not only our study highlights the need for quantitatively assessing the predictive value of regulatory networks, but also provides evidence that very large sample size does not necessarily yield high quality networks. These results may open new avenues of research for integrative analysis of multiple data types.
Cancer is a disease whose process is characterized by the accumulation of somatic alterations to the genome, which selectively make a cancer cell fitter to survive. The understanding of progression models for cancer, i.e., the identification of sequences of mutations that leads to the emergence of the disease, is still unclear. The problem of reconstructing such progression models is not new; in fact, several methods to extract progression models from cross-sectional samples have been developed since the late 90s.
Recently, we have proposed a number of algorithms to reconstruct cancer progression models both from aggregate, population level data -- i.e., collections of patients' data, as many TCGA datasets --- and from individual level data -- i.e., single tumor, or even single cell data. We perform the reconstruction by exploiting the notion of probabilistic causation in the spirit of Suppesâ€™ causality theory. We note that in the context of biological systems and cancer progression, the notion of causality can be interpreted as the notion of "selective advantage" of the occurrence of a mutation.
In this setting, we have proven the correctness of our algorithms and characterized their performance. Our algorithms are collected in a R BioConductor package "TRanslational ONCOlogy" (TRONCO) that we have successfully used as part of our "Pipeline for Cancer Inference" (PiCnIc) to analyse Colorectal Cancer (CRC) data from TCGA, which highlighted possibly biologically significant patterns in the progressions inferred.
To conclude the presentation, we will present some novel applications of our algorithm suite and of our pipeline to other TCGA datasets and list some new developments for individual tumour analysis.
The dream of a powerful integrated computational framework, only hinted at in Ibn Sina's Canon, can now be fulfilled at a global scale as a result of many recent advances: foundational advances in statistical inference; hypothesis-driven experiment design and analysis and the dissemination of peer-reviewed publications among communities of scientists; distributed large-scale databases of scientific and auxiliary experimental data; algorithmic approaches to model building and model checking; machine learning approaches to generate large number of hypotheses, and multiple hypotheses testing to tame computational complexity and false-discovery rates, etc. We will focus on an application centered on cancer - "the emperor of all maladies." The topics this talk will cover include: Probabilistic causation, Causal analysis of Cancer genome data, Kernel based methods for survival analysis, Improved single-cell/single-molecule data via SubOptical Mapping, CRISPR Assays, CHA and Therapy design, Immuno-therapy, Liquid Biopsies, etc.
Forward thinking method discussion
All speakers of the day / panel discussion