iarjhss

IARJHSS

2708-6267

10.47310/iarjhss.2024.v05.i02.006

Comparative Analysis of Overlap Graph and Hamiltonian Path Approaches in Genome Assembly

Chayamiti

Tungamirai

Junjuan

School of Science, Zhejiang University of Science and Technology, Hangzhou, China

Genome assembly is a critical process in bioinformatics, where short DNA sequences (reads) are pieced together to reconstruct an organism's genome. As sequencing technologies generate vast amounts of data, the need for effective computational methods to assemble these reads has grown. Two prominent graph theory-based approaches—the overlap graph and the Hamiltonian path approaches—offer different strategies for this task. This study focuses on the construction and decoding aspects of these graph-based methods, providing a comparative analysis of their effectiveness in genome assembly. This research will explore the intricacies of constructing and decoding these graphs, examining how each approach handles challenges such as repetitive sequences, sequencing errors, and varying read lengths. The construction phase will be analysed in terms of computational efficiency, focusing on the algorithms used to build the graphs and the preprocessing required to manage large datasets. The decoding phase will be evaluated based on the accuracy of the assembled genome, considering factors like contiguity (N50), error rates, and the ability to resolve complex genomic regions. A key aspect of this research is the comparison of the decoding strategies used in both approaches. For the overlap graph approach, the focus will be on greedy algorithms that iteratively connect reads with the best overlaps. The Hamiltonian path approach, on the other hand, will be examined through the lens of heuristic and approximation algorithms designed to tackle its inherent computational complexity. This research will be supported by practical experiments using real-world sequencing data, allowing for a detailed evaluation of how each method performs under different conditions. The research will also consider the scalability of these approaches, particularly in the context of emerging sequencing technologies that produce longer and more accurate reads. Ultimately, this thesis aims to provide a clear understanding of the trade-offs involved in the construction and decoding of overlap graphs and Hamiltonian paths for genome assembly. By focusing on the graph theory aspects of these methods, the research will offer insights into their strengths and limitations, guiding the selection of the most appropriate approach for different genomic challenges. The findings will contribute to the ongoing development of more efficient and accurate genome assembly techniques, with potential applications in a wide range of biological and medical research.