Comparative analysis of a few haplotype assembly algorithms

Shuying Sun (May 5, 2020)

Please install the Flash Plugin

Abstract

Authors:
Shuying Sun, Sherwin Massoudian, and Allison Bertie Johnson

Haplotype information is important to further understand the genetic processes of diseases. Therefore, it is crucial to obtain haplotypes for disease studies. With the development of next generation sequencing (NGS) technologies, it is now possible to obtain haplotypes using sequencing reads. The process of determining haplotypes based on sequencing reads is called haplotype assembly. It is challenging to conduct haplotype assembly because NGS datasets are very large and have complex genetic and technological features. Even though a large number of approaches or software packages have been developed, it is unclear how well these programs perform. Most of them are not well evaluated as they may be only compared with a small number (e.g., 1 or 2) of other methods and are validated based on different datasets. In this project, we conduct a comprehensive analysis to compare a few currently available haplotype assembly software packages. We will assess them based on their statistical or computational methods, algorithmic components, and evaluation features as well. We will show our comparison results based on a publicly available dataset. With our comparison results, we shall provide users with both detailed input on the performance of current methods and new perspectives on haplotype assembly, which will be helpful for developing more accurate and efficient algorithms.