Algorithms and data structures for exploring DNA or RNA collections
Séminaire Données et Aléatoire Théorie & Applications
13/03/2025 - 14:00 Camille Marchet Salle séminaire 2 RDC
DNA sequences define the rules governing the autonomy of life. DNA acts as a program of finite length that encodes, notably, its own rules for interpreting itself. Therefore, studying DNA from a computer science perspective is a natural approach. In fact, computer science has significantly contributed to our understanding of genetic material. Bioinformatics has provided a unique perspective on genomes, distinct from traditional wet-lab experiments, by enabling in silico analysis. It has introduced essential data structures, without which studying certain genomic problems would be impossible—either due to the sheer scale of experiments or the complexity of problem-solving, which requires algorithms and computational power. It has also proposed models, such as graphs, which serve to organize and structure information in a way that highlights key properties of genomes. In this presentation, I will discuss the current challenges of genomic data management and exploitation, as global data volumes have surpassed several petabytes. I will also explore different approaches to facilitate the analysis of these vast collections of datasets, leveraging text- or graph-based algorithms specifically adapted to genomics.