Workshop 4: Analysis and Visualization of Large Collections of Imaging Data
2D and 3D Imaging of Entire Cells and Tissues at Macromolecular Resolution by Advanced Electron Microscopic ApproachesManfred Auer
All processes defining and sustaining life take place within cells or between cells, often mediated by multiprotein supramolecular complexes also known as macromolecular machines . While Structural Genomics and recently Cryo-EM have yielded ever-increasing insight into the overall shape of such macromolecular machines and the detailed mechanism of e.g. catalysis, we lack insight how these machines are organized and function within cells. We must determine their 3D organization, subcellular location (e.g. with respect to ultrastructural landmarks) and their interaction with other proteins, the cytoskeleton and organelles and any changes of these characteristics during embryonic development, as part of their physiological function or during pathogenesis.
Using various biological examples, including inner ear hair cells and related tissues central to hearing, mammary gland development and breast cancer, as well microbial communities, I will illustrate the power of modern 2D and 3D electron microscopy imaging, including widefield montaging TEM, TEM tomography as well as Focused Ion Beam Scanning Electron Microscopy (FIB/SEM) and Serial Block Face (SBF/) SEM. The latter two techniques can yield macromolecular insight into the 3D organization of entire cells and tissues, and thus have the potential to truly revolutionize cell biology.
However, while it is now possible to obtain 10k by 10k by 10k voxel data sets (soon 32kx32kx32k), we do not possess the computational capabilities to deal with such terabytes of data, in terms of visualization, feature extraction/segmentation, annotation, and quantitative analysis. I will discuss the challenges these novel imaging approaches pose, describe how we currently deal with such data sets and discuss some emerging solutions that my lab has helped develop in combination with computer scientists at LBL and elsewhere.
We hope that this meeting can lead to further solutions in order to overcome this enormous bottle-neck that is emerging in structural cell biology imaging.
Mapping neural networks in brain, retina and spinal cord requires (1) comprehensive parts lists (vertex types), (2) nanometer scale connection detection (edge types), and (3) millimeter scale network tracing. Together this requires high resolution transmission electron microscope (TEM) imaging on a scale not routinely possible. Combining serial sectioning and TEM hardware control (SerialEM, Mastronarde, 2005, PMID 16182563) it is possible to created automated TEM (ATEM) imaging of mammalian retina composed of â‰ˆ0.4-1.4M high resolution images and assemble them into coherent 3D volumes of 16-21 TB of raw image data (Anderson et al., 2009, PMID 19855814). How should we build even larger connectomes? At present, we estimate that 100â€”1000 TEM systems are underutilized globally, representing the most cost-effective scale-up path.
The next ask is navigating, exploring, segmenting and annotating the data space for characterization of network motifs. The key tool for this is the Viking system developed by Anderson et al. (2010 PMID 21118201, 2011 PMID 21311605). Viking allows web-compliant delivery of imagery and collection of markup. Our experience suggests that complex network analysis (rather than simple shape-building) is more effectively done by teams of trained annotators working under executive analysts rather than crowd-sourcing. At present, retinal connectome RC1 contains >825,000 discrete annotations tracking â‰ˆ 680 neural structures with 8300 connections, and â‰ˆ 300 glia. We estimate this is only â‰ˆ20% of the expected annotation set. However, efficiency accelerates 2-3 fold as the density of identified process collision increases. Further many connection sets are replicates
Lacking robust tools for automated tracking, the best strategy has been to use multichannel molecular / activity markers and cell classification to segment and prioritize tracking targets, followed by intensive manual annotation (Lauritzen et al., 2012 PMID 23042441). A scientifically credible yet computationally manageable data volume in the vertebrate nervous system is currently in the 10-50 TB range. But a persistent challenge has been the vast scale differences for different neurons in a volume. There are an estimated 60+ classes of cells in the mammalian retina. For example retinal connectome RC1 is a disk of neural retina 243 Î¼m wide and 30 Î¼m tall. It contains 104 copies of the single class of rod bipolar cells with axonal fields spanning 15-25 Î¼m; 39 copies of one of its target cells the AII amacrine cell, whose dendrites span 50-60 Î¼m; and a dozen copies of a larger target, the AI amacrine cell with dendrites spanning up to 1000 Î¼m. But one of the main targets of the is the ganglion cell superclass, which can be segmented into 15 classes. RC1 contains a few whole copies of a few of these, and fragments of many more, but a volume containing a complete set would need to be > 1 mm in diameter and require over 250 TB. While this is computationally feasible, the acquisition time for one microscope is not practical, nor is routine small-team annotation. Clearly, scaling by (1) platform and (2) automated annotation is critical.
Finally, some key networks in the retina are massive structured hubs. The AII amacrine cells has partnerships with at least 30 cell classes, meaning that all retinal neurons are no more than 2 hops from an AII cell and most no more than 2 hops from any other cell. This means that network graphs for even small volumes are extremely dense and differential visualization is a major goal.
A new model is developed for joint analysis of ordered, categorical, real and count data, motivated by brain imaging and human behavior analysis. In the motivating application, the ordered and categorical data are answers to questionnaires, the (word) count data correspond to the text questions from the questionnaires, and the real data correspond to fMRI responses for each subject. We also combine the analysis of these data with single-nucleotide polymorphism (SNP) data from each individual. The questionnaires considered here correspond to standard psychological surveys, and the study is motivated by psychology and neuroscience. The proposed Bayesian model infers sparse graphical models (networks) jointly across people, questions, fMRI stimuli and brain activity, integrated within a new matrix factorization based on latent binary features. We demonstrate how the learned model may take fMRI and SNP data from a subject as inputs, and predict (impute) how the individual would answer a psychological questionnaire; going in the other direction, we also use an individual's SNP data and answers from questionnaires to impute unobserved fMRI data. Each of these two imputation settings have practical and theoretical applications for understanding human behavior and mental health, which are discussed.