Axel Darmouni (@ADarmouni) / X

Axel Darmouni

3,165 posts

Axel Darmouni

@ADarmouni

AI Engineer @CentraleSupelec P22 | Data Scientist. Full AI content, mostly LLM-related

Paris, France

Joined November 2019

Axel Darmouni
@ADarmouni
Dec 23, 2024
🧵 This weekend, I did a little fun side project, inspired by @GoogleDeepMind’s Gemini 2.0 Flash Thinking release Basically the idea was: what if we could distill its thinking capacities into a smaller model, enhancing their reasoning performances? More info below ⬇️
141K
Axel Darmouni
@ADarmouni
Mar 23, 2025
A very small model for powerful document analysis 📖 Read of the day, season 3, day 26: « SmolDocling: An ultra-compact vision-language model for end-to-end multimodal document conversion » by @AsNassar, @andimarafioti et al from @IBMResearch and @huggingface The core idea of
29K
Axel Darmouni
@ADarmouni
Oct 26, 2025
Very cool work by @a1zhang and @lateinteraction Instead of calling an LM to solve a problem, let it be able to agentically call an LM that works over an environment, storing prompt and context (that evolve over time) The root LM then answers using all info aggregated by the
42K
Axel Darmouni
@ADarmouni
Nov 16, 2025
Impressive that @inria_paris managed to pre-train and SFT an LLM completely for 3 sizes (1.5, 8, 24B)! Named it Gaperon, and their report covers the whole pipeline from data gathering to the training itself Pre-trained base version matches leaderboard albeit with slight
14K
Axel Darmouni
@ADarmouni
Jan 2, 2025
Creating an LLM as the backbone of a chess strategy 🧵 📖 Read of the day, season 3, day 1: « Mastering Board Games by External and Internal Planning with Large Language Models », by Schultz, Adamek et al from @GoogleDeepMind The authors of that paper make 3 contributions : 1-
17K
Axel Darmouni
@ADarmouni
Sep 6, 2025
The UI Tars 2 report blew my expectations Massively thought out data collection and a pipeline of CT + SFT + RL, impressive display of setup, Multi Turn Online RL, concluded by model merging… To beat the SoTA of OpenAI and Anthropic of CUA Is this the ByteDance moment?
19K
Axel Darmouni
@ADarmouni
Jul 23, 2024
I don’t know what’s most amazing about Llama release - 8B is one of the best models in category - 70B is gpt-4o-mini level - 405B is gpt-4o level - 405B is meant to be available on all cloud providers for use - The whole multimodality section Multimodality section is insane
13K
Axel Darmouni
@ADarmouni
Sep 29, 2025
Was looking for something like this Super cool!
Ivan Velichko
@iximiuz
Sep 22, 2025
LeetCode, but for Linux, Docker, and Kubernetes? 🧐 Check out my collection of carefully crafted practical problems - with automated checks, helpful hints, and step-by-step solutions: labs.iximiuz.com/challenges A hands-on challenge a day keeps skill rot away.
8K
Axel Darmouni
@ADarmouni
Aug 5, 2025
Aaand @huggingface x @openai already has a finetuning guide on the ready to visualise how to finetune it Can’t love them enough @QGallouedec you guys are heroes
Axel Darmouni
@ADarmouni
Aug 5, 2025
Those models are hilariously good Curious on how finetuning would go with the reasoning modes, suppose you’d always pick low?
26K
Axel Darmouni
@ADarmouni
Jul 13, 2024
Repurposing PaliGemma as multimodal multi-vector encoder 🧵📖 Read of the day, day 104: ColPali: Efficient Document Retrieval with Vision Language Models, by @ManuelFaysse, @sibille_hugues, @tonywu_71 et al from Illuin Technology arxiv.org/pdf/2407.01449 The authors of this
6.8K
Axel Darmouni
@ADarmouni
Aug 7, 2025
The hard/fun part of this competition is that you don’t just work on train/test of the ARC programs You work on ARC-GEN generated samples as well While a program may solve the train/test samples, it can fail in capturing the ARC task’s complete logic Which means here you need
François Chollet
@fchollet
Aug 7, 2025
Kaggle just launched the NeurIPS 2025 Code Golf competition -- the goal is for you to write Python solution programs to ARC-AGI-1 tasks, while keeping the programs as small as possible. Are you better at writing code than frontier models? kaggle.com/competitions/g…
20K
Axel Darmouni
@ADarmouni
Jul 13, 2024
Google presents: PaliGemma, a SoTA 3B VLM 🧵📖 Read of the day, day 103: PaliGemma: A versatile 3B VLM for transfer, by @giffmana, @ASusanoPinto, @AndreasPSteiner, @__kolesnikov__, @brainshawn et al from @GoogleDeepmind Zurich arxiv.org/pdf/2407.07726 The authors of this paper
6.2K
Axel Darmouni
@ADarmouni
Mar 11, 2025
Small weekend project I’ve made: turn any textual datasets into an OCR benchmark Sample images here taken from Gsm8k test, rest below ⬇️
14K
Axel Darmouni
@ADarmouni
Apr 21, 2024
Fact-Checking Generated Outputs using a corpus of documents is possible at a GPT-4 level, for only 770M parameters. 🧵📖 Read of the day, day 31: MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents, by @LiyanTang4 et al from the University of Texas
7.9K