You can beat OpenAI embeddings with just 200 examples.
We proved this with bge-base-en earlier this year, but bge-M3 takes it even further. It reaches the same peak performance as BGE-base with ~70,000 fewer examples (70% less data) plus better performance out of the box.
With
Developer Experience @GoogleDeepMind prev @manusai. I write at ivanleo.com.
Tweets are my own views!
- Taught gemini how to highlight PDF text in a document :) Dropping in a bit in our docs
- Ever struggled to understand how users use your product? I just built an open source implementation of Anthropic's internal clustering algorithm - CLIO. With Gemini Flash, you can generate human readable labels which are clustered and grouped together to spot usage patterns.
00:00 - When I started doing LLM evaluations, I kept running into the same question - how do I know that my 2% improvement was meaningful. Anthropic's recent paper is spot on - when working with large language models, we need to factor in uncertainty into our benchmarks. This isn't
- I've had some people ask me for advice on ML so I compiled some of my thoughts into 3 tips that I wish I had thought of/done more of when I first started I'm honestly still a beginner but I figured I'd offer some thoughts as to how someone can juggle this with a fulltime job
- We improved our LLM's recall from 0.86 to 1.0 with a single sentence added to the prompt. By looking at our failure cases, we found that our model was being overly specific with the categories we were applying. For instance, if users asked for denim bottoms, it would
- Structured Outputs with audio files has never been easier. Just define a pydantic model, use the Audio object to read it in and you're good to go
- Literally got into machine learning because of @jeremyphoward's FastAI course a few months ago. This is surreal. This just made my entire week.
- Using whisper is so 2023. Just use gemini, pass in the raw audio directly and prompt the model directly with all the questions you have. With instructor, we can get - The exact mispronounced word - The timestamp when we did it - Advice on how to do better Flash truly is the
00:00 - For those interested in llama 4 being multimodal, I would like to point out @rasbt ‘s great walkthrough on multimodal LLMs again
- I got @willccbb's script to work on @modal_labs for GRPO! This is a minimal version that will run on a single GPU, for some reason the trainer doesn't seem to work for me nicely when I have multiple GPUs. But a good first step :)
- One of the most interesting discoveries in the past few days has been musicgen-small ( large if u have gpu). I've been using it to generate short snippets that are honestly... not too bad?
- Save 50% on your OpenAI Bill ( even on fine-tuned models) with batch jobs using Instructor Just use our new BatchJobs object :)
00:00





