MBI Videos

CTW: Statistics, Geometry, and Combinatorics on Stratified Spaces arising from Biological Problems

  • video photo
    Peter Bubenik
    One of the principal uses of topology is to patch together local quantitative data to obtain global qualitative information not readily accessible to other methods. While the early development of topology was largely driven by applications, many later advances were motivated by strictly mathematical concerns. Now the field of applied topology is returning topology to its roots, adapting some of the later advances in topological methods to current questions in applications. I will survey some of the central constructions in topological data analysis, introducing homology and persistent homology.

    There is a clear need to combine these tools with statistical analysis. However there are difficulties in doing so, as the space of the usual topological descriptor is not a manifold. I define a new topological descriptor, the persistence landscape, whose definition allows for the calculation of means and standard deviations, laws of large numbers, central limit theorems and hypothesis testing.
  • video photo
    Megan Owen
    The space of metric phylogenetic trees, as constructed by Billera, Holmes, and Vogtmann, is a polyhedral cone complex. This space is non-positively curved, which ensures there is a unique shortest path (geodesic) between any two trees, and that the mean and variance of a set or distribution of trees is well-defined. Furthermore, there is a polynomial time algorithm to compute geodesics, which leads to a practical algorithm for computing mean trees. I will present some applications of this mean and variance to some biological problems, such as constructing species trees from gene trees and understanding the effect of sequence length on tree reconstruction. This is joint work with Ezra Miller and Scott Provan.
  • video photo
    Robert MacPherson
    Stratified spaces arise in many contexts within mathematics. They are the natural class of topological spaces of "finite complexity". In many cases, they come endowed with canonical probability distributions on them.

    This will be a survey talk with examples, such as spaces of configuration of points.
  • video photo
    John Kent
    Projective geometry underlies the way in which information about a 3d scene can be deduced from (one or more) 2d camera views. A key concept in projective geometry is that of a projective invariant for a configuration of collinear or coplanar points. The collection of information in the projective invariants can be termed the "projective shape" of a configuration. In this talk we use a spherical camera and adapt ideas from the Procrustes approach to similarity shape analysis to give a standardized representation for projective shapes. The resulting geometry faciliates metric comparisons between different projective shapes. The resulting topology leads to a clear understanding of the singularities in projective shape space.

    Finally, the details behind the standardization lead to a distinction between four variants of projective shape space depending on the "type" of camera: oriented vs. non-oriented and directional vs. axial.
  • video photo
    Satyan Devadoss
    Our story is motivated by the configuration space of particles on spheres. In the 1970s, Grothendieck, Deligne, and Mumford constructed a way to keep track of particle collisions in this space using Geometric Invariant Theory. In the 1990s, Gromov and Witten utilized them as invariants arising from string field theory and quantum cohomology. We consider the real points of these spaces, but now interpret them as spaces of rooted metric labeled trees. They have elegant geometric and combinatorial properties, being compact hyperbolic manifolds with a beautiful tessellation by convex polytopes. In recent years, they have gained importance in their own right, appearing in areas such as representation theory, geometric group theory, tropical geometry, and lately reinterpreted by Levy and Pachter as spaces of phylogenetic networks. In particular, these real moduli spaces resolve the singularities of the spaces of phylogenetic trees studied by Billera, Holmes, and Vogtmann.
  • video photo
    Stephan Huckemann
    The classical central limit theorem states that suitably translated and root n rescaled independent sample means tend to a multivariate Gaussian. Under certain, still rather restrictive conditions, it has been shown by Bhattacharya and Patrangenaru (2005) that the analog holds true on manifolds. One condition, namely uniqueness has been pushed to "data contained in a geodesic half ball" by Afsari (2011), which in particular encompasses "omitting a neighborhood of the cut locus" if non-void.

    Determining asymptotics when the cut locus is not omitted proves to be challenging. For circles we present an exhaustive treatment of uniqueness and, in view of asymptotics, of the role of mass around the antipodal point.

    Another issue turning up in shape spaces -- which may be manifolds with singularities -- is whether means omit these singularities and are stably assumed on the manifold part. We show that while intrinsic and Ziezold means are manifold stable, Procrustes means may hit singularities.

    In consequence, e.g. for 3D shape analysis, given uniqueness, discrimination and classification based on the two-sample test is possible for intrinsic or Ziezold means. Procrustes means, however, may disqualify.

    This talk is based on joint work with Thomas Hotz.
  • video photo
    Armin Schwartzman
    Symmetric positive semi-definite (PSD) matrices appear as data objects in the statistical analysis of Diffusion Tensor Imaging data, where there is interest in making inferences about the eigenvalues and eigenvectors of these objects. In this talk, I present a stratification of the set of symmetric PSD matrices of arbitrary dimension according to their eigenvalues, as well as maximum likelihood estimators (MLEs) and log-likelihood ratio (LLR) tests for the eigenvalues and eigenvectors of the mean matrix in a symmetric-matrix Gaussian model. The parameter sets involved are subsets of Euclidean space that are either affine subspaces, polyhedral convex cones, or orthogonally invariant embedded submanifolds. The asymptotic behavior of the MLEs and LLRs depend on the stratum where the true mean matrix lies.
  • video photo
    Rudolf Beran
    A multivariate k-way layout consists of observations with error on an array of vector-valued means, each of which is an unknown function of k real-valued covariates. Any decomposition of these vector means into a sum of orthogonal projections induces least squares submodel fits that serve as candidate estimators of the mean vectors. MANOVA submodel fits, nested polynomial regression fits, or mixed combinations of both strategies illustrate classically. This talk describes penalized least squares estimators of the multivariate means in which the penalty terms are weighted through manifold-valued tuning parameters. Data-based selection of the tuning parameters yields estimators that dominate asymptotically those that arise from submodel fitting. In the special case of a complete balanced multivariate k-way layout, the proposed regularized estimators are linked to multiple Efron-Morris affine shrinkage. In unbalanced designs, the regularized estimators define a powerful generalization of affine shrinkage.
  • video photo
    Aasa Feragen
    Anatomical tree-structures such as airway trees from lungs, blood vessels or dendrite trees in neurons, carry information about the organ that they are part of. Anatomical trees can be modeled as geometric trees, which are combinatorial trees whose edges are endowed with edge attributes describing their geometry. We consider edge attributes which take continuous scalar or vector values, leading to a continuum of trees rather than a discrete set of trees.

    We shall discuss different ways of building spaces of such geometric trees, all with the goal of obtaining a geodesic space of trees where statistical parameters can be computed with the help of geodesics. For geometric trees of any size, we can define a geodesic space of trees, but geodesic computations are NP complete and the space has nowhere bounded curvature, which means that many statistical tools are not readily available. By adding restrictions on size, admissible topologies, branch order and/or branch labeling, we can regularize the space in order to obtain spaces which have nicer properties in terms of computational complexity and statistical applications. We shall discuss the positive effect of these assumptions on the solvability of statistical problems along with their negative effect on the ability to model real anatomical trees. Finally, we shall present some recent results from experiments on airway trees from lung CT scans.
  • video photo
    Harrie Hendriks
    The context will be the estimation of a parameter of a probability distribution, where the parameter lies in a differentiable manifold, more specifically in a submanifold of Euclidean space. The parameter could be a Frechet mean of a probability distribution on the submanifold itself, Frechet mean with respect to the Euclidean distance. We will give an account of the two-sample problem.

    This talk is based on joint work with Zinoviy Landsman. Examples from the literature will be indicated. Downs considered the QRS loop in vectorcardiograms, characterized by a pair of orthogonal unit vectors in 3-space. The space of such pairs is the Stiefel manifold V32, and can be considered as submanifold of 6-dimensional Euclidean space. A more involved example, considered by Rivest et al., is the human ankle joint that exhibits two independent rotation axes of the foot. The directions of these axes are of importance.
  • video photo
    Giseon Heo
    Persistent homology, a recent development in computational topology, has shown to be useful for analyzing high dimensional non-linear data. In this talk, we connect computational topology with the traditional analysis of variance and demonstrate this synergy on a three-dimensional orthodontic landmark data set derived from the maxillary complex. (Joint work with Jennifer Gamble and Peter Kim)
  • video photo
    Wilfrid Kendall
    The subject of Riemannian barycentres has a strikingly long history, stretching back to work of Frechet and Cartan. The first part of this talk will be a review of the fundamental ideas and a discussion of the work of various probabilists and statisticians on applications of the concept to probabilistic approaches to harmonic map theory and statistical shape theory. I will then present some recent joint work with Huiling Le concerning central limit theory for empirical barycentres, which to our considerable surprise has led us to a new perspective on the classical Lindeberg-Feller central limit theorem.
  • video photo
    Peter Kim
    C. difficile associated outbreaks have been reported worldwide, some with increased mortality and morbidity. Symptoms of this infectious disease range from mild diarrhea to severe colitis and even bowel perforation and death. The bacterium C. difficile is found with the normal bacteria comprising the intestinal flora. These can be killled by antibiotics but not the C. difficile spores, which are insensitive to the majority of antibiotics. The diagnosis of C. difficile infection is based on clinical signs and symptoms and a positive laboratory test for toxigenic C. difficile. Of particular concern is the NAP1/BI/027 strain which has affected North American hospitals of which Southern Ontario hospitals have been especially hard hit. In this talk we will discuss some recent experience with C. difficile along with the use of fecal biotherapy as an effective alternative to standard antibiotics. We will also go over some of the metagenomic sequencing results outlining the observed changes in intestinal flora pre and post fecal biotherapy.
  • video photo
    J. S. Marron
    Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Challenges in modern medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. The notion of Object Oriented Data Analysis also impacts data analysis, through providing a language for discussion of the many choices needed in many modern complex data analyses. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.
  • video photo
    Hongtu Zhu
    Not available
  • video photo
    Ezra Miller
    Applications to areas such as biology, medicine, and image analysis require understanding the asymptotics of distributions on stratified spaces, such as tree spaces. In the surprisingly common circumstance when Frechet (intrinsic) means of distributions on stratified spaces lie on strata of low dimension, central limit theorems can exhibit non-classical "sticky" behavior: positive mass can be supported on thin subsets of the ambient space. This talk reports on investigations initiated by a Working Group at the Statistical and Applied Mathematical Sciences Institute (SAMSI) program on Analysis of Object Data, and continued jointly with Stephan Huckemann, Jonathan Mattingly, and Jim Nolen.
  • video photo
    David Houle
    The genetics and evolution of biological systems are extremely complex because of the large number of traits , and complex relationships among those traits. We use the form of fruit fly wings as a model to study the variational properties of complex biological structures. Variation is important because it controls evolutionary potential. Questions about evolutionary potential of high-dimensional entities raise a series of difficult mathematical and statistical problems.

    Our data suggests that the dimensionality of the underlying system is very high. Could the data lie on a manifold embedded in the linear space of phenotypes? If so, phenomena that seem complex could have simple explanations. Manifold-finding based on genotypic data has not yet been attempted.
    The pattern of variation in two different populations can be quite different. Can we identify the common phenotypic subspace, and, even more interesting, the subspaces where one has variation, and the other does not? Statistical approaches to those questions are not known (at least in biology)
    How can we understand and predict the appearance of qualitatively novel phentoypes? Qualtitative novelty is one of the largest unsolved problems in biology. Is it possible to construct metrics for the ?novelty distance? between phenotypes that predict evolution? One possible kind of metric could combine geometry and topology as is done with persistent homology. Biology may offer different metrics based on the effects of mutation or common transitions during development.
    Biologists need the expertise of mathematicians and statisticians to help us answer these important questions.
  • video photo
    Rabi Bhattacharya
    The general theory of nonparametric statistics on manifolds M presented here is of relatively recent origin. It builds much of its framework on the notion of the Fre'chet mean of a probability measure Q, namely, the point on the manifold which minimizes the expected squared distance from a random variable with distribution Q. The nonparametric methods are intrinsic or extrinsic, depending on the distance used on M. The extrinsic distance is the distance induced from a good embedding of M in a Euclidean space, while the intrinsic distance is the geodesic distance on the manifold when endowed with a Riemannian structure. In examples, it is often the case that the nonparametric methods yield sharper inference than their parametric counterparts provide. Although we consider an application to paleomagnetism where M is the sphere S2, our main emphasis is on landmarks based shape spaces. The latter include (i) spaces of 2D and 3D images invariant under an appropriate group of transformations, which are useful in morphometrics and medical diagnostics, (ii) affine shape spaces invariant under affine transformations, useful in scene recognition based on satellite images, and (iii) projective shape spaces used in machine vision and robotics. We also consider 2D continuous images, and nonparametric estimation of shape densities.

    This talk is based on joint work with Vic Patrangenaru and Abhishek Bhattacharya. It is supported in part by the NSF grant DMS 1107053.
  • video photo
    Marc Arnaudon
    We give detailed results on the existence and uniqueness for medians, means and minimax centers of probability measures on Riemannian manifolds, including the case when the probability measure is supported in a regular geodesic ball and the case of generic data points in a complete manifold. Some properties of Fr'echet medians are also given, such as statistical consistency and quantitative explanation of robustness. In order to compute the Riemannian medians and means, we develop deterministic and stochastic gradient descent algorithms. We show the convergence of these algorithms in regular geodesic balls. The rate of convergence and error estimates of these algorithms are also obtained. For probability measures with support in compact manifolds, partial simulated annealing is used to obtain processes which converge to the means. Simulation examples of our algorithms are also shown, in the case of Toeplitz Hermitian positive definite matrices coming from covariance matrices of autoregressive processes. Applications to signal detection are given.
  • video photo
    Susan Holmes
    Many studies are underway to describe the human microbiome, I will describe some of the methods used that combine phylogenetic trees and abundance data from high throughput sequencing and new microarray techniques.

    In particular various `particular metrics' have shown useful in coming to conclusions about explanatory clinical or contingent variables in such studies.

    This talk contains joint work with PJ McMurdie, as well as David Relman's lab and Alfred Sporman at Stanford.

View Videos By