LLM Reference

LLM Reference helps tech leaders discover and compare the best AI models and providers to ship the right solution for any project.

Visit Website

About LLM Reference

LLM Reference is a decision-support directory engineered specifically for developers, technical architects, and AI leaders who need to navigate the chaotic and rapidly expanding landscape of large language models. The core problem it solves is simple but painful: teams waste countless hours hunting through scattered blog posts, Twitter threads, vendor documentation, and benchmark leaderboards just to answer a single question like "which model is best for my coding agent?" LLM Reference eliminates that friction by consolidating everything into one trustworthy, continuously updated hub. It currently tracks over 1,800 language models from more than 140 providers and 247 research labs, with data refreshed weekly to capture new releases, verified price changes, and benchmark updates. The product is built for fast triage: you can quickly identify the right model for your specific task, determine the most cost-effective provider, and get back to building. Whether you are constructing a coding assistant, an agentic workflow, a writing tool, or a research pipeline, LLM Reference gives you a single pane of glass to compare models side-by-side, see who offers the cheapest frontier output pricing, and browse curated editors' picks for specific tasks like coding, agents, writing, research, image generation, and video creation. A live Pulse feed highlights what changed this week, including new models, price cuts, and benchmark refreshes, keeping you informed without the noise. It is a Data Advantage project, updated daily, and has become an essential resource for anyone shipping with LLMs.

Features of LLM Reference

Curated Editors' Picks for Specific Tasks

Instead of forcing you to wade through raw benchmark tables, LLM Reference provides curated, task-specific recommendations from expert editors. These picks cover six major categories: Coding, Agents, Writing, Research, Image generation, and Video creation. Each pick includes a detailed rationale, relevant benchmark scores, and a clear "excellent" or similar rating. For example, Claude Fable 5 is the top pick for coding with an 80.3% SWE-bench Pro score, while Veo 3.1 is the best video model with 30-second clips and native audio up to 4K. This feature lets you start with a trusted recommendation and then drill deeper into the data.

Comprehensive Model and Provider Directory

The core of LLM Reference is a fully searchable, filterable directory of over 1,800 models from 140 providers and 247 labs. You can search by model name, provider, or task type, and filter by capabilities like coding, RAG, agents, long context, vision, classification, and JSON or tool use. Each model entry includes key performance data, pricing per million tokens, and links to the provider. The directory is designed to be the single source of truth for model discovery, eliminating the need to visit multiple vendor pages.

Live Pulse Feed with Weekly Changes

The Pulse feed is a dynamic changelog that highlights exactly what changed in the model market over the past week. It tracks three key signals: new models (e.g., 177 added this week), verified price cuts from providers (e.g., 53 this week), and benchmark refreshes (e.g., 368 this week). This feature is critical for staying current without manual research. You can see at a glance that DiffusionGemma 26B A4B IT was released, that frontier output pricing dropped to $0.260 per 1M tokens for Hunyuan HY3 Preview, and which benchmarks were updated.

Side-by-Side Model Comparison

LLM Reference includes a dedicated comparison tool that lets you place two models directly against each other. This is ideal for making final decisions between two strong candidates, such as Claude Fable 5 versus GPT-5.5 or Claude Opus 4.8 versus Claude Opus 4.7. The comparison surfaces key differences in benchmark scores, pricing, and provider details. This feature transforms complex trade-offs into a clear, data-driven decision.

Interactive Cheat Sheet for Common Questions

For users who want instant answers, the Cheat Sheet section lists the most-asked model comparisons, such as Claude Fable 5 vs. Claude Opus 4.8 or GPT-5.5 vs. Gemini 3.1 Pro Preview. Each comparison is a clickable link that takes you directly to the comparison tool with those two models pre-loaded. This feature reduces decision time for recurring questions and helps new users quickly understand the competitive landscape.

Use Cases of LLM Reference

Selecting the Best Model for a Production Coding Assistant

An engineering team building an AI-powered coding assistant needs a model that excels at code generation, debugging, and tool use. Using LLM Reference, they can navigate to the "Coding" board under Editors' Picks, where Claude Fable 5 is recommended with supporting evidence from SWE-bench Pro and OSWorld-Verified. They can then compare it against alternatives like Claude Opus 4.8 or GPT-5.5 using the side-by-side tool, check the pricing for their expected token volume, and verify the provider's reliability. This process, which previously took days of research, now takes minutes.

Optimizing Cost for a High-Volume Agentic Workflow

A startup building a customer support agent with high request volume needs to minimize cost without sacrificing performance. They can use LLM Reference's frontier pricing tracker to identify the cheapest provider for frontier output, currently Hunyuan HY3 Preview at $0.260 per 1M output tokens via Tencent Cloud. They can then check the "Agents" editors' pick to see if this model is recommended for agentic tasks, or compare it against the top-rated Claude Sonnet 4.6 for agent performance. The tool helps them balance cost and quality for their specific use case.

Researching the Latest Models for a New Project

A research scientist starting a new project on long-context document understanding needs to know the state of the art. They can browse the Pulse feed to see the 177 new models added this week, filter the model directory by "Long context" capability, and review the "Research" board where Claude Fable 5 is the top pick based on GDPval-AA ELO scores. They can also check the "Best of" section to see the overall leader for their task. LLM Reference provides a comprehensive snapshot of the current landscape in one session.

Comparing Video Generation Models for a Creative Campaign

A creative director evaluating video generation models for a brand campaign needs to compare quality, length, and resolution. They can visit the "Video" editors' pick, which recommends Veo 3.1 for its 30-second clips, native audio, and 4K output. They can then compare it against alternatives like Runway Gen-4.5 or Wan 2.7 using the comparison tool, check provider pricing, and review any benchmark data available. This structured comparison replaces hours of watching sample videos and reading scattered reviews.

Frequently Asked Questions

How often is LLM Reference updated?

LLM Reference is updated daily by the Data Advantage project. The model directory and provider information are refreshed weekly to capture new releases, verified price changes, and benchmark updates. The Pulse feed specifically highlights what changed in the past week, including new models, price cuts, and benchmark refreshes. This ensures you always have access to the most current data for your decision-making.

How are the Editors' Picks determined?

Editors' Picks are curated by the LLM Reference team based on a combination of quantitative benchmark scores, real-world performance data, and expert analysis. Each pick includes a detailed rationale and specific benchmark evidence, such as SWE-bench Pro scores for coding or Chatbot Arena ELO for writing. The picks are regularly reviewed and updated as new models and data become available. They are designed as a starting point for decision-making, not a definitive ranking.

Can I compare models from different providers directly?

Yes, LLM Reference includes a dedicated side-by-side comparison tool that allows you to compare any two models from different providers. You can compare performance benchmarks, pricing per million tokens, and provider details all in one view. The tool is designed to help you make trade-offs between cost, quality, and specific capabilities. You can also access common comparisons directly from the Cheat Sheet on the homepage.

Is LLM Reference free to use?

Yes, LLM Reference is a free resource provided by the Data Advantage project. There are no paywalls or subscription fees for accessing the model directory, editors' picks, Pulse feed, comparison tools, or any other feature. The project is maintained as a public service to help engineers and technology leaders navigate the LLM ecosystem. You can browse all 1,843 language models, 140 providers, and 247 labs without any cost.

Explore more in this category:

Best AI Assistants AI tools

View all alternatives for LLM Reference