Manuel Faysse (@ManuelFaysse) / X

Manuel Faysse

850 posts

Manuel Faysse

@ManuelFaysse

NLP Research, PhD candidate: @CentraleSupelec Prev: @AIatMeta @imperialcollege, @epfl

Paris, France

Joined November 2013

Pinned
Manuel Faysse
@ManuelFaysse
Jul 2, 2024
🚨 Introducing "ColPali: Efficient Document Retrieval with Vision Language Models" ! We use Vision LLMs + late interaction to improve document retrieval (RAG, search engines, etc.), solely using the image representation of document pages ! arxiv.org/abs/2407.01449 🧵(1/N)
76K
Manuel Faysse
@ManuelFaysse
Sep 27, 2024
🚨 New model alert: ColQwen2 ! It's ColPali, but with a Qwen2-VL backbone, making it the best visual retriever to date, topping the Vidore Leaderboard with a significant +5.1 nDCG@5 w.r.t. colpali-v1.1 trained on the same data ! 🚀 (1/N)
vidore/colqwen2-v0.1 · Hugging Face
From huggingface.co
144K
Manuel Faysse
@ManuelFaysse
Jul 17, 2025
Introducing ColQwen-Omni, a 3B omnimodal retriever that extends the ColPali concept of multimodal retrieval with late interaction to audio chunks and short videos, with no performance degradation on visual document retrieval wrt our best models! (1/N)
57K
Manuel Faysse
@ManuelFaysse
Sep 4, 2024
Enough people asked - we obliged- here's the entire ColPali training set: huggingface.co/datasets/vidor… ! We hope this can help bootstrap some ColPali finetuning efforts and we're eager to see cool work from the community !
vidore/colpali_train_set · Datasets at Hugging Face
From huggingface.co
41K
Manuel Faysse
@ManuelFaysse
Jul 2, 2025
🚨Should We Still Pretrain Encoders with Masked Language Modeling? We have recently seen massively trained causal decoders take the lead in embedding benchmarks, surpassing encoders w/ bidirectional attention. We revisit whether Bert-style encoders are a thing of the past? (1/N)
37K
Manuel Faysse
@ManuelFaysse
Sep 4, 2024
Super happy that people agree with the main takeaway from our ColPali paper: the future of Document AI is doing everything in vision space - not over engineering brittle text extraction pipelines !
merve
@mervenoyann
Sep 4, 2024
I was giving consultancy about multimodal RAG to some external people, wanted to post it here stop using OCR + LLMs. if you want to retrieve, use ColPali, if you want RAG from docs, use vision language models
32K
Manuel Faysse
@ManuelFaysse
Oct 21, 2024
I prepared a little notebook for a class on Vision RAG: ColPali + GPT4o/Qwen2VL to generate cross-lingual answers + some interpretability maps. Very simple but a good quickstart with lots of cool concepts ! github.com/ManuelFay/Tuto…
12K
Manuel Faysse
@ManuelFaysse
Mar 29, 2025
> be me > Submit a paper about document retrieval > Tag the paper as Information Retrieval > Learn through reviewer 2 that nobody cares about retrieval and that the whole field has been doing it wrong for years
15K
Manuel Faysse
@ManuelFaysse
Jun 2, 2025
🚨 Context matters for effective retrieval—but most embedding models cannot leverage crucial information outside of the passage they embed. Our new paper "Context Is Gold to Find the Gold Passage" explores how context-aware embeddings can be trained to boost performance! 🧵(1/N)
21K
Manuel Faysse
@ManuelFaysse
Nov 7, 2024
In the past month, tons of new activity in the Visual Retrieval space ! SOTA ColQwen2 checkpoints, but also smaller & quicker models, efficient multilingual models, and even image embedding endpoints from big tech ! Here's a little recap thread🧵(1/N)
15K
Manuel Faysse
@ManuelFaysse
Apr 17, 2025
ElasticSearch has started supported late interaction models (ColBERT, ColPali) in their most recent releases ! Amazing news, as many companies build with Elastic and changing DBs was just too much friction - our technology is easy as ever to use in production now !
9.6K
Manuel Faysse
@ManuelFaysse
Nov 1, 2024
Having the best VLM makes this very easy - treating PDFs as a series of images enables this sort of thing out of the box ! I am guessing they either use a multi-page sliding window with a memory buffer, or a vision-space retriever like ColPali to do search on large documents !
Anthropic
@AnthropicAI
Nov 1, 2024
Claude can now view images within a PDF, in addition to text. This helps Claude 3.5 Sonnet more accurately understand complex documents, such as those laden with charts or graphics. Enable the feature preview: claude.ai/new?fp=1.
00:00
7.9K
Manuel Faysse
@ManuelFaysse
Sep 27, 2023
Replying to @MistralAI
An inspection of the strings within tokenizer.model reveals a path: @/mnt/test/datasets/tokenizer_training/8T_train_data/shuffled.txttok_v0 A 7B model trained on 8T tokens is a 1142 token/param ratio, about 57 times the Chinchilla optimal ratio ! Great model in perspective !
24K
Manuel Faysse
@ManuelFaysse
Nov 27, 2024
🚨 Yesterday, @huggingface dropped SmolVLM, today, we're happy to report Day 0 support in colpali_engine with the new ColSmolVLM ! Performance is on par with our flagship ColQwen2-v1.0 model on English tasks, but logically a bit behind on multilingual tasks ! (1/N)
21K