Camille Marchet \kamij maʁʃɛ\

Logo

Email: camille.marchet at univ-lille.fr

CRIStAL, CNRS, Université de Lille

Bio / topics

I am a research associate in BONSAI team (Lille, France). My work focuses on methods and data structures in sequence bioinformatics, with applications to RNA in particular.

Je suis chargée de recherche au CNRS dans l’équipe BONSAI (Lille, France). Je travaille sur des méthodes et structures de données dédiées à la bioinformatique des séquences, avec souvent des applications à l’ARN.

Intéressé.e par une formation à la bioinformatique pour les données de séquençage ? Le CNRS propose une formation à Lille (labos/entreprises) : lien.

After an engineer degree in Bioinformatics from INSA de Lyon and a MsC in Ecology and Evolution from Université Claude Bernard Lyon 1, I worked for two years as an engineer in ERABLE team (LBBE, Lyon) with Vincent Lacroix. I obtained a PhD funding in GenScale team (Rennes, France), where I was supervised by Pierre Peterlongo. I defended my PhD in 2018 and joined BONSAI in the CRIStAL lab afterwards as a postdoc. Lately I was recruited by the CNRS to work as a researcher in the same lab (CV).

My postdoc took part in Transipedia ANR, with Rayan Chikhi and Mikaël Salson. Transipedia aims at being a transcriptome-encyclopedia, e.g., facilitating indexing, query and exploitation of the numerous publicly available RNA-seq data. I am mostly working on new data structures to index large collections of NGS datasets. Before and during my PhD I worked on methods for transcriptomics, in particular for de novo variants discovery and RNA long read analysis.

Job offers

Contents

News


Career


Research themes

My research lies at the intersection of algorithms, data structures, and sequence bioinformatics, with a particular focus on large-scale sequencing data and RNA-related applications. It combines methodological developments, software design, and applications to real datasets, often in interdisciplinary and international collaborations.

Data structures for large-scale sequencing data

A central part of my work focuses on the design of compact and scalable data structures for indexing and querying large collections of sequencing datasets, in particular through k-mer representations.

This work is developed within projects such as Find-RNA (ANR JCJC, PI) and Full-RNA, and builds on earlier work in Transipedia. It has led to several tools, including REINDEER, REINDEER2, CBL, and PAC, which enable scalable exploration of thousands of sequencing datasets.

Graph-based models, pan-transcriptomics and pangenomics

Another major direction of my research concerns graph-based representations of sequences, in particular de Bruijn graphs, which are central to pangenomics and reference-free approaches.

This work has been supported by projects such as ALPACA ITN, and is connected to national and international initiatives structuring the field. In particular, I contribute to MIGGS (Methods for Interfacing with Graphs of Genomic Sequences), a community focused on graph-based genomic data, and to networks such as GET-a-Pan and RECENT, which foster interactions between bioinformatics and theoretical computer science.

It involves collaborations with groups in Cambridge, Helsinki, and Stockholm, and includes algorithmic contributions to dynamic and updatable graphs (e.g. Cdbgtricks) as well as ongoing work on visualization and interaction, including Vizitig:
https://www.biorxiv.org/content/10.1101/2025.04.19.649656v2

RNA and transcriptomics applications

My work is strongly motivated by applications to RNA sequencing and transcriptomics, both in methodological developments and in collaboration with domain scientists.

I have worked on problems such as de novo transcript discovery, long-read RNA analysis, and large-scale exploration of RNA-seq datasets, including cancer-related data. These applications are central to projects such as Full-RNA and ESCALATE, and involve close interactions with biological and medical partners.

This work includes the development of tools such as REINDEER and REINDEER2, as well as collaborative studies on large transcriptomic datasets.

Algorithms for sequence analysis

Beyond specific data structures, I am interested in the design of efficient algorithms for sequence analysis, including sketching techniques, hashing-based methods, and sampling strategies.

This work connects to projects such as INSSANE (ANR) on RNA structure analysis, and includes contributions on minimizers, sketching, and sequence indexing.


Team and supervision

I am a member of of the Bonsai team. Here’s a list of people I personnally supervise:

Collaborations and networks

My research is developed within a structured network of collaborations at local, national, and international levels, often at the interface between algorithms, bioinformatics, and genomics. These collaborations are largely organized through joint projects, and support both methodological developments and applications.


Teaching and training


Outreach and visibility