👤 Profile
Second-year PhD student in Interpretability
for Natural Language Processing models.
Second-year PhD student in Interpretability
for Natural Language Processing models.
IRT Saint Exupéry & IRIT, in Toulouse France.
Professors Nicholas Asher and Philippe Muller, and
Doctor Fanny Jourdan.
Core maintainer of the Interpreto (NLP) and Xplique (Vision) explainability open-source libraries.
My goal is to provide easy access to useful explanations.
Concept-based Explanations for Language Models
If these subjects are of interest to you, feel free to contact me, I would be happy to collaborate.
Interpreto is a Python library for post hoc explainability of text HuggingFace models, from early BERT variants to LLMs. It provides two complementary families of methods: attributions and concept-based explanations. The library connects recent research to practical tooling for data scientists, aiming to make explanations accessible to end users. It includes documentation, examples, and tutorials. Interpreto supports both classification and generation models through a unified API. A key differentiator is its concept-based functionality, which goes beyond feature-level attributions and is uncommon in existing libraries.
Xplique (pronounced \ɛks.plik\) is a Python toolkit dedicated to explainability. The goal of this library is to gather the state of the art of Explainable AI to help you understand your complex neural network models. Originally built for Tensorflow's model it also works for PyTorch models partially. The library is composed of several modules, the Attributions Methods module implements various methods (e.g Saliency, Grad-CAM, Integrated-Gradients...), with explanations, examples and links to official papers. The Feature Visualization module allows to see how neural networks build their understanding of images by finding inputs that maximize neurons, channels, layers or compositions of these elements. The Concepts module allows you to extract human concepts from a model and to test their usefulness with respect to a class. Finally, the Metrics module covers the current metrics used in explainability. Used in conjunction with the Attribution Methods module, it allows you to test the different methods or evaluate the explanations of a model.
The aviation industry operates within a highly critical and regulated context, where safety and reliability are paramount. As Natural Language Processing (NLP) systems become increasingly integrated into such domains, ensuring their trustworthiness and transparency is essential. This paper addresses the importance of explainability (XAI) in critical sectors like aviation by studying NOTAMs (Notice to Airmen), a core component of aviation communication. We provide a comprehensive overview of XAI methods applied to NLP classification task, proposing a categorization framework tailored to practical needs in critical applications. We also propose a new method to create aggregated explanations from local attributions. Using real-world examples, we demonstrate how XAI can uncover biases in models and datasets, leading to actionable insights for improving both. This work highlights the role of XAI in building safer and more robust NLP systems for critical sectors and also shows that academic efforts must be pursued to achieve trust in models and XAI itself.
When automating plan generation for a real-world sequential decision problem, the goal is often not to replace the human planner, but to facilitate an iterative reasoning and elicitation process, where the human's role is to guide the AI planner according to their preferences and expertise. In this context, explanations that respond to users' questions are crucial to improve their understanding of potential solutions and increase their trust in the system. To enable natural interaction with such a system, we present a multi-agent Large Language Model (LLM) architecture that is agnostic to the explanation framework and enables user- and context-dependent interactive explanations. We also describe an instantiation of this framework for goal-conflict explanations, which we use to conduct a user study comparing the LLM-powered interaction with a baseline template-based explanation interface.
ConSim is a metric for concept-based explanations based on simulatability and user-llms. It shows consistent methods ranking across datasets, models, and user-llms. Furthermore, it correlates with faithfulness and complexity.
My teaching contributions focus on explainability, with hands-on sessions on machine learning and deep learning.