🧵 This weekend, I did a little fun side project, inspired by @GoogleDeepMind’s Gemini 2.0 Flash Thinking release
Basically the idea was: what if we could distill its thinking capacities into a smaller model, enhancing their reasoning performances?
More info below ⬇️
Axel Darmouni
3,165 posts
- A very small model for powerful document analysis 📖 Read of the day, season 3, day 26: « SmolDocling: An ultra-compact vision-language model for end-to-end multimodal document conversion » by @AsNassar, @andimarafioti et al from @IBMResearch and @huggingface The core idea of
- Very cool work by @a1zhang and @lateinteraction Instead of calling an LM to solve a problem, let it be able to agentically call an LM that works over an environment, storing prompt and context (that evolve over time) The root LM then answers using all info aggregated by the
- Impressive that @inria_paris managed to pre-train and SFT an LLM completely for 3 sizes (1.5, 8, 24B)! Named it Gaperon, and their report covers the whole pipeline from data gathering to the training itself Pre-trained base version matches leaderboard albeit with slight
- Creating an LLM as the backbone of a chess strategy 🧵 📖 Read of the day, season 3, day 1: « Mastering Board Games by External and Internal Planning with Large Language Models », by Schultz, Adamek et al from @GoogleDeepMind The authors of that paper make 3 contributions : 1-
- The UI Tars 2 report blew my expectations Massively thought out data collection and a pipeline of CT + SFT + RL, impressive display of setup, Multi Turn Online RL, concluded by model merging… To beat the SoTA of OpenAI and Anthropic of CUA Is this the ByteDance moment?
- I don’t know what’s most amazing about Llama release - 8B is one of the best models in category - 70B is gpt-4o-mini level - 405B is gpt-4o level - 405B is meant to be available on all cloud providers for use - The whole multimodality section Multimodality section is insane
- Was looking for something like this Super cool!LeetCode, but for Linux, Docker, and Kubernetes? 🧐 Check out my collection of carefully crafted practical problems - with automated checks, helpful hints, and step-by-step solutions: labs.iximiuz.com/challenges A hands-on challenge a day keeps skill rot away.
- Aaand @huggingface x @openai already has a finetuning guide on the ready to visualise how to finetune it Can’t love them enough @QGallouedec you guys are heroesThose models are hilariously good Curious on how finetuning would go with the reasoning modes, suppose you’d always pick low?
- Repurposing PaliGemma as multimodal multi-vector encoder 🧵📖 Read of the day, day 104: ColPali: Efficient Document Retrieval with Vision Language Models, by @ManuelFaysse, @sibille_hugues, @tonywu_71 et al from Illuin Technology arxiv.org/pdf/2407.01449 The authors of this
- The hard/fun part of this competition is that you don’t just work on train/test of the ARC programs You work on ARC-GEN generated samples as well While a program may solve the train/test samples, it can fail in capturing the ARC task’s complete logic Which means here you needKaggle just launched the NeurIPS 2025 Code Golf competition -- the goal is for you to write Python solution programs to ARC-AGI-1 tasks, while keeping the programs as small as possible. Are you better at writing code than frontier models? kaggle.com/competitions/g…
- Google presents: PaliGemma, a SoTA 3B VLM 🧵📖 Read of the day, day 103: PaliGemma: A versatile 3B VLM for transfer, by @giffmana, @ASusanoPinto, @AndreasPSteiner, @__kolesnikov__, @brainshawn et al from @GoogleDeepmind Zurich arxiv.org/pdf/2407.07726 The authors of this paper
- Small weekend project I’ve made: turn any textual datasets into an OCR benchmark Sample images here taken from Gsm8k test, rest below ⬇️
- Fact-Checking Generated Outputs using a corpus of documents is possible at a GPT-4 level, for only 770M parameters. 🧵📖 Read of the day, day 31: MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents, by @LiyanTang4 et al from the University of Texas


























