LLM Reference
LLM Reference spotlights the best AI models and providers, letting tech leaders compare and ship with the perfect pick for any project.
Visit
About LLM Reference
LLM Reference is your high-stakes, decision-support command center for the chaotic, fast-moving world of large language models. Built specifically for engineers, technical leaders, and AI builders who need to ship with confidence, this directory tracks over 1,800 models from more than 140 providers and 247 research labs, all updated daily. The core value proposition is brutally simple: stop wasting hours hunting through scattered sources, blog posts, and social media threads, and start building with the right model and provider, fast. Whether you are constructing a coding assistant, an agentic workflow, a writing tool, or a research pipeline, LLM Reference gives you a single, trustworthy place to compare models side-by-side, see who offers the cheapest pricing for frontier output, and browse curated editors' picks for specific tasks like coding, agents, writing, research, image generation, and video creation. The site is designed for rapid triage, allowing you to quickly identify the right model for your job, determine the most cost-effective provider, and get back to the important work of building. With a Pulse feed that highlights exactly what changed this week, including new models, price cuts, and benchmark refreshes, LLM Reference keeps you informed without the noise. It is built by the Data Advantage project and updated daily, making it an essential, spotlight-ready resource for anyone who needs to stay current with the exploding LLM ecosystem.
Features of LLM Reference
Comprehensive Model Directory with Live Search
The heart of LLM Reference is its massive, searchable directory of over 1,800 language models from 140 providers and 247 labs. You can search by task, provider, or model name, and filter results by specific capabilities like coding, RAG, agents, long context, vision, classification, and JSON or tool use. This feature transforms a sprawling, fragmented landscape into a clean, organized database that you can query in seconds, letting you spotlight the exact model you need without any friction.
Curated Editors' Picks for Task-Specific Excellence
Forget generic leaderboards. LLM Reference features a showcase of Editors' Picks that are hand-curated for specific, high-stakes tasks. These picks cover six categories: Coding, Agents, Writing, Research, Image, and Video. Each pick comes with a detailed rationale, citing specific benchmark scores and real-world performance data. This feature is a spotlight on the best of the best, giving you a trusted starting point for your project without the paralysis of analysis.
Live Pulse Feed for Market Changes
The Pulse feed is your weekly, real-time radar for the LLM market. It tracks three critical signals: new models, verified price cuts, and benchmark refreshes. In a given week, you might see 177 new models, 53 price cuts, and 368 benchmark refreshes. This feature cuts through the noise and highlights exactly what changed, ensuring you never miss a critical update that could save you money or give you a performance edge.
Side-by-Side Model and Provider Comparison
The dedicated Compare tool lets you pit two models or providers against each other in a head-to-head showdown. You can view key metrics like pricing per million tokens, benchmark scores, and context window sizes all on one screen. This feature is designed for fast, decisive triage, allowing you to quickly determine which model is the best and most cost-effective option for your specific use case, turning a complex decision into a simple comparison.
Use Cases of LLM Reference
Selecting the Best Coding Assistant Model
A developer building a production coding assistant needs a model that excels at non-trivial engineering tasks. Using LLM Reference, they can navigate to the Coding editors' picks and immediately see that Claude Fable 5 is the top recommendation, with an 80.3% SWE-bench Pro score and 96% on SWE-bench Verified. They can then compare its pricing against other top coding models like Claude Opus 4.8 or GPT-5.5 to find the most cost-effective provider for their API calls.
Optimizing Cost for High-Volume Agentic Workflows
An engineering team deploying an agentic workflow that makes thousands of API calls per day needs to balance performance with cost. They can use the Best section to find the cheapest frontier output, currently the Hunyuan HY3 Preview via Tencent Cloud TI Platform at $0.260 per million output tokens. They can then cross-reference this model against the Agents editors' picks to ensure it maintains high performance on tool-use benchmarks like tau-bench.
Comparing Video Generation Models for a Creative Project
A creative director looking for the best video generation model for a brand campaign can use the Creatives section. They will see that Veo 3.1 is the top pick, offering 30-second clips, native audio, and up to 4K resolution through Vertex AI. They can then compare it directly against other top contenders like Runway Gen-4.5 and Wan 2.7 using the Compare tool, evaluating factors like output quality, pricing, and generation speed.
Researching the Latest Open-Weight Model for Fine-Tuning
A machine learning researcher needs to find the best open-weight model for a fine-tuning project. They can browse the Developers leaderboard and see that DeepSeek V4 Pro is the top pick for open weights. They can then click through to the model's detail page to view its benchmark scores, context window, and provider pricing, ensuring they select a model that is both performant and fits their budget for training and inference.
Frequently Asked Questions
How often is the data on LLM Reference updated?
The data on LLM Reference is refreshed on a weekly basis, with a daily update cycle for critical signals. The Pulse feed highlights the most recent changes, including new models, verified price cuts, and benchmark refreshes from that week. This ensures you are always working with the most current and accurate information in the fast-moving AI landscape.
What is the difference between Editors' Picks and the general leaderboards?
Editors' Picks are hand-curated, task-specific recommendations from the LLM Reference team, providing a starting point for common use cases like coding, agents, and writing. The general leaderboards, on the other hand, are data-driven rankings based on aggregated benchmark scores and performance metrics, allowing you to sort models by specific criteria like coding ability or cost.
Can I compare models from different providers directly?
Yes, the Compare tool is designed specifically for this purpose. You can select any two models from the directory and view a side-by-side comparison of their key attributes, including pricing per million tokens, benchmark scores, context window size, and provider information. This feature is essential for making informed, cost-effective decisions.
How does LLM Reference verify price changes and new models?
Price changes are verified by the LLM Reference team through direct monitoring of provider pricing pages and official announcements. New models are added as they are released by providers and research labs, with the directory currently tracking over 1,800 models from more than 140 providers and 247 labs. The Pulse feed provides a transparent log of all verified updates.
Similar to LLM Reference
Distro
Distro is an AI Distribution Operator that helps B2B teams publish content, find buyer conversations, engage prospects, and turn socialintent into p
SEETO AI
Seeto tracks competitor surfaces — pricing, hiring, docs, integrations, trust pages — and surfaces every change as a discrete alert.
Hintder AI
Screenshot a dating profile, get 5 personalized openers that actually get replies — no generic AI lines.
Easymotion - AI Motion Graphic Generator
AI motion graphics and map animation generator. Create videos, charts, UI explainers, and map animations with AI.
Oravaa
Deploy human-like Voice AI to automate high-volume customer service, instantly qualify web leads, and manage operational appointment bookings.
PrompTessor
Unlock the power of AI with PrompTessor, your all-in-one workspace to create, optimize, and manage high-quality prompts effortlessly.