Email: camille.marchet at univ-lille.fr
CRIStAL, CNRS, Université de Lille
I am a research associate in BONSAI team (Lille, France). My work focuses on methods and data structures in sequence bioinformatics, with applications to RNA in particular.
Je suis chargée de recherche au CNRS dans l’équipe BONSAI (Lille, France). Je travaille sur des méthodes et structures de données dédiées à la bioinformatique des séquences, avec souvent des applications à l’ARN.
Intéressé.e par une formation à la bioinformatique pour les données de séquençage ? Le CNRS propose une formation à Lille (labos/entreprises) : lien.
After an engineer degree in Bioinformatics from INSA de Lyon and a MsC in Ecology and Evolution from Université Claude Bernard Lyon 1, I worked for two years as an engineer in ERABLE team (LBBE, Lyon) with Vincent Lacroix. I obtained a PhD funding in GenScale team (Rennes, France), where I was supervised by Pierre Peterlongo. I defended my PhD in 2018 and joined BONSAI in the CRIStAL lab afterwards as a postdoc. Lately I was recruited by the CNRS to work as a researcher in the same lab (CV).
My postdoc took part in Transipedia ANR, with Rayan Chikhi and Mikaël Salson. Transipedia aims at being a transcriptome-encyclopedia, e.g., facilitating indexing, query and exploitation of the numerous publicly available RNA-seq data. I am mostly working on new data structures to index large collections of NGS datasets. Before and during my PhD I worked on methods for transcriptomics, in particular for de novo variants discovery and RNA long read analysis.
Launch of the thematic year “L’ADN dans tous ses états” (2026–2028), dedicated to DNA and computer science
https://ddal.inria.fr/adn_etat/
My research lies at the intersection of algorithms, data structures, and sequence bioinformatics, with a particular focus on large-scale sequencing data and RNA-related applications. It combines methodological developments, software design, and applications to real datasets, often in interdisciplinary and international collaborations.
A central part of my work focuses on the design of compact and scalable data structures for indexing and querying large collections of sequencing datasets, in particular through k-mer representations.
This work is developed within projects such as Find-RNA (ANR JCJC, PI) and Full-RNA, and builds on earlier work in Transipedia. It has led to several tools, including REINDEER, REINDEER2, CBL, and PAC, which enable scalable exploration of thousands of sequencing datasets.
Another major direction of my research concerns graph-based representations of sequences, in particular de Bruijn graphs, which are central to pangenomics and reference-free approaches.
This work has been supported by projects such as ALPACA ITN, and is connected to national and international initiatives structuring the field. In particular, I contribute to MIGGS (Methods for Interfacing with Graphs of Genomic Sequences), a community focused on graph-based genomic data, and to networks such as GET-a-Pan and RECENT, which foster interactions between bioinformatics and theoretical computer science.
It involves collaborations with groups in Cambridge, Helsinki, and Stockholm, and includes algorithmic contributions to dynamic and updatable graphs (e.g. Cdbgtricks) as well as ongoing work on visualization and interaction, including Vizitig:
https://www.biorxiv.org/content/10.1101/2025.04.19.649656v2
My work is strongly motivated by applications to RNA sequencing and transcriptomics, both in methodological developments and in collaboration with domain scientists.
I have worked on problems such as de novo transcript discovery, long-read RNA analysis, and large-scale exploration of RNA-seq datasets, including cancer-related data. These applications are central to projects such as Full-RNA and ESCALATE, and involve close interactions with biological and medical partners.
This work includes the development of tools such as REINDEER and REINDEER2, as well as collaborative studies on large transcriptomic datasets.
Beyond specific data structures, I am interested in the design of efficient algorithms for sequence analysis, including sketching techniques, hashing-based methods, and sampling strategies.
This work connects to projects such as INSSANE (ANR) on RNA structure analysis, and includes contributions on minimizers, sketching, and sequence indexing.
I am a member of of the Bonsai team. Here’s a list of people I personnally supervise:
My research is developed within a structured network of collaborations at local, national, and international levels, often at the interface between algorithms, bioinformatics, and genomics. These collaborations are largely organized through joint projects, and support both methodological developments and applications.
National collaborations (ANR projects and consortia)
Transipedia (ANR)
A consortium dedicated to indexing and exploring large collections of RNA-seq datasets, in collaboration with teams in Montpellier, Toulouse, and Paris. This project led to several tools and publications, and to long-term collaborations with both algorithmic and biomedical groups.
INSSANE (ANR, site coordinator)
A project on computational methods for RNA structure analysis, involving collaborations with LIX, Université Paris Cité, and other partners. This project connects algorithmic questions with RNA biology and structural analysis.
Full-RNA (ANR)
A project focused on indexing large-scale RNA datasets, involving collaborations with several French bioinformatics groups and supporting developments such as REINDEER2.
Find-RNA (ANR JCJC, PI)
My current project, which includes collaborations with local and national partners, in particular on graph querying and formal methods (e.g. with the D-DAL team), and application to transcriptomics.
In addition, I contribute to national initiatives such as:
International collaborations
I collaborate with several groups abroad on topics related to sequence indexing, pangenomics, and RNA analysis: