Nils Reimers (@Nils_Reimers) / X

Nils Reimers

2,689 posts

Nils Reimers

@Nils_Reimers

VP AI Search @Cohere | ex-huggingface | Creator of SBERT (sbert.net)

Joined August 2016

Pinned
Nils Reimers
@Nils_Reimers
Apr 24, 2025
𝐕𝐢𝐬𝐢𝐨𝐧 𝐑𝐀𝐆𝐑𝐀𝐆 𝐨𝐧 𝐂𝐨𝐦𝐩𝐥𝐞𝐱 𝐆𝐫𝐚𝐩𝐡𝐢𝐜𝐬🖼️ RAG is mostly text-only, even though we have so much data available as charts/figures. Combine @cohere lastest Embed v4 embedding model with a vision-LLM like @GoogleDeepMind Gemini to get 𝐕𝐢𝐬𝐢𝐨𝐧 𝐑𝐀𝐆
14K
Nils Reimers
@Nils_Reimers
Jan 28, 2022
GPT-3 Embeddings by @OpenAI was announced this week. 📈 I was excited and tested them on 20 datasets 😢 Sadly they are worse than open models that are 1000 x smaller 💰 Running @OpenAI models can be a 1 million times more expensive tinyurl.com/gpt3-emb
Nils Reimers
@Nils_Reimers
Jul 3, 2024
𝐒𝐞𝐦𝐚𝐧𝐭𝐢𝐜 𝐒𝐞𝐚𝐫𝐜𝐡 𝐨𝐧 𝟏𝟎𝟎𝐌 𝐝𝐨𝐜𝐬 - 𝐖𝐢𝐭𝐡 𝟏𝟎𝟎𝐌𝐁 𝐨𝐟 𝐌𝐞𝐦𝐨𝐫𝐲 GPU-poor and Memory-poor, and not having 500GB of memory to embed & index 100M docs? Still want to participate at TREC-RAG 2024? Introducing 𝐃𝐢𝐬𝐤𝐕𝐞𝐜𝐭𝐨𝐫𝐈𝐧𝐝𝐞𝐱
106K
Nils Reimers
@Nils_Reimers
Mar 13, 2024
🇺🇳𝟐𝟓𝟎𝐌 𝐖𝐢𝐤𝐢𝐩𝐞𝐝𝐢𝐚 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐬 𝐢𝐧 𝟑𝟎𝟎+ 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞𝐬 🇺🇳 What could you build if your RAG has access to Wikipedia in all 300+ languages? Available for anyone to use, using our state-of-the-art multilingual embedding model: huggingface.co/datasets/Coher…
85K
Nils Reimers
@Nils_Reimers
Mar 18, 2024
🚀 𝐂𝐨𝐡𝐞𝐫𝐞 𝐄𝐦𝐛𝐞𝐝 𝐕𝟑 - 𝐢𝐧𝐭𝟖 & 𝐛𝐢𝐧𝐚𝐫𝐲 𝐒𝐮𝐩𝐩𝐨𝐫𝐭🚀 I'm excited to launch our native support for int8 & binary embeddings for Cohere Embed V3. They slash your vector DB cost 4x - 32x while keeping 95% - 100% of the search quality. txt.cohere.com/int8-binary-em…
88K
Nils Reimers
@Nils_Reimers
May 3, 2021
Happy to announce that today is my first day at @huggingface. Looking forward to meet the new team. First project will be on better integration of the huggingface hub into SentenceTransformers - Sharing your own SBERT.net models will become super easy!
Nils Reimers
@Nils_Reimers
Sep 8, 2021
🚨Model Alert🚨 🏋️‍♂️ State-of-the-art sentence & paragraph embedding models 🍻State-of-the-art semantic search models 🔢State-of-the-art on MS MARCO for dense retrieval 📂1.2B training pairs corpus 👩‍🎓215M Q&A-training pairs 🌐Everything Available: SBERT.net 🧵
Nils Reimers
@Nils_Reimers
Mar 26, 2024
0⃣ 𝐖𝐨𝐫𝐥𝐝 𝐅𝐢𝐫𝐬𝐭 𝐁𝐢𝐧𝐚𝐫𝐲 𝐕𝐞𝐜𝐭𝐨𝐫 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞 1⃣ Happy to annouce the world first 𝐁𝐢𝐧𝐚𝐫𝐲 𝐕𝐞𝐜𝐭𝐨𝐫 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞 (for educational purposes). 💰32x less memory 💰 🚀 40x faster search 🚀 Github: github.com/cohere-ai/Bina…
47K
Nils Reimers
@Nils_Reimers
Jul 5, 2024
𝐁𝐌𝟒𝟐 - 𝐓𝐡𝐞 𝐌𝐢𝐬𝐬𝐢𝐧𝐠 𝐁𝐞𝐧𝐜𝐡𝐦𝐚𝐫𝐤 Qdrant released this week an interesting new approach that claims to replace BM25/lexical search. They just sadly forgot to do proper benchmarking. As it turns out: BM42 is way worse than BM25.
Qdrant
@qdrant_engine
Jul 5, 2024
Hey all! We actually did find a discrepancy with our previous benchmarks of bm42. Please don't trust us and always check performance on your own data. Our best effort to correct it is here: github.com/qdrant/bm42_ev…
113K
Nils Reimers
@Nils_Reimers
Nov 2, 2023
𝗖𝗼𝗵𝗲𝗿𝗲 𝗘𝗺𝗯𝗲𝗱 𝗩𝟯 - 𝗢𝘂𝗿 𝗡𝗲𝘄 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 Our team has been hard at work to ship the best embedding model for noisy & complex data. After great feedback, finally publically available.
Introducing Embed v3
From cohere.com
69K
Nils Reimers
@Nils_Reimers
Oct 17, 2022
MTEB - Massive Text Embedding Benchmark 🧨 Text embeddings are usefull for many applications 💻, but still their evaluation is often done rather poorly on trivial datasets 🙁. MTEB is here to change it. We collected 58 datasets across 8 tasks and evaluated many public models.
Nils Reimers
@Nils_Reimers
Dec 12, 2022
🇺🇳Semantic Search finally works across languages! 🇺🇳 Semantic Search gives great search results, but worked so far just for English😰 Glad to share our new cohere multilingual embedding model for 100+ languages. And the results are amazing 📈 Details: txt.cohere.ai/multilingual/
Nils Reimers
@Nils_Reimers
Jun 3, 2021
🚨Sentence-Embeddings Model Alert🚨 Significantly better sentence embeddings models are now available in Sentence-Transformers: sbert.net/docs/pretraine… Models have been evaluated on 14 challenging datasets including data from Twitter, Reddit, biomedical domain, e-mails and more
Nils Reimers
@Nils_Reimers
Jun 22, 2021
📺How to train state-of-the-art sentence embeddings? 📺 Just uploaded my 3-part video series on the theory how to train state-of-the-art sentence embedding models: