LLM Reference

LLM Reference helps tech leaders instantly search, compare, and pick the right AI model and provider for their project.

AI Assistants Free

Visit LLM Reference

AI tool Details

Published May 29, 2026

Explore More

Best AI Assistants AI tools

Alternatives

View Alternatives

LLM Reference application interface and features

About LLM Reference

LLM Reference is a decision-support directory built specifically for engineers and technology leaders who need to select the right large language model (LLM) and provider in today's fast-moving AI landscape. It tracks over 1,800 language models from more than 140 providers and 247 research labs, with data refreshed weekly to include new releases, verified price changes, and benchmark updates. The core value proposition is simple: stop wasting time hunting through scattered sources and start shipping with confidence. Whether you are building a coding assistant, an agentic workflow, a writing tool, or a research pipeline, LLM Reference gives you a single, trustworthy place to compare models side-by-side, see who offers the cheapest pricing for frontier output, and browse curated editors' picks for specific tasks like coding, agents, writing, research, image generation, and video creation. The site is designed for fast triage. You can quickly identify the right model for your job, determine the most cost-effective provider, and get back to building. With a Pulse feed that highlights what changed this week, including new models, price cuts, and benchmark refreshes, LLM Reference keeps you informed without the noise. It is built by the Data Advantage project and updated daily, making it an essential resource for anyone who needs to stay current with the exploding LLM ecosystem. The platform also features a model directory, a comparison tool, a cheat sheet for common comparisons, and a changelog to track every update.

Features

Model Directory and Search

The comprehensive model directory lets you search through over 1,800 language models from 140 providers and 247 labs. You can filter models by task type such as coding, RAG, agents, long context, vision, classification, and JSON or tool use. This allows you to quickly narrow down options based on your specific use case and technical requirements.

Editors' Picks and Curated Boards

LLM Reference features curated editors' picks for specific tasks, organized into boards for developers, knowledge workers, and creatives. Each pick includes an excellence rating and a detailed rationale explaining why that model is recommended. Boards cover coding, agents, writing, research, image generation, video, and more, helping you start with proven choices.

Pulse Feed and Weekly Updates

The Pulse feed shows exactly what changed in the model market this week. It tracks new models, verified price cuts from providers, and benchmark refreshes across major evaluation suites. This keeps you informed about the latest developments without having to monitor multiple news sources or social media channels.

Model Comparison Tool

The built-in comparison tool lets you evaluate two models side-by-side. You can compare performance metrics, pricing, and capabilities directly. The cheat sheet section also provides quick answers to the most-asked comparisons, such as Claude Fable 5 versus GPT-5.5, saving you time on research.

Use Cases

Selecting a Coding Assistant Model

When building a coding assistant, you need a model with strong performance on software engineering benchmarks. LLM Reference helps you find models like Claude Fable 5, which achieves 80.3% on SWE-bench Pro and 96% on SWE-bench Verified. You can compare coding-specific models and see which ones are best for non-trivial engineering tasks.

Choosing a Cost-Effective Provider

For production deployments, cost is critical. LLM Reference tracks the cheapest frontier output pricing across all providers. You can see that Hunyuan HY3 Preview via Tencent Cloud TI Platform costs only $0.260 per 1 million output tokens. This feature helps you balance performance with budget constraints.

Building an Agentic Workflow

Agent-based applications require models that stay on-task across long tool loops and self-correct without prompting. LLM Reference identifies the best agents models like Claude Sonnet 4.6, which achieves 87.5 on tau-bench. You can quickly find models optimized for tool use and agentic behavior.

Research and Knowledge Work

For research pipelines and knowledge work, you need models that excel at analysis, finance, trading, and analytics. LLM Reference highlights models like Claude Fable 5 with a GDPval-AA ELO of 1932. The research board provides a clear starting point for teams building document analysis or data extraction tools.

Frequently Asked Questions

How often is the data on LLM Reference updated?

The data is refreshed weekly to include new model releases, verified price changes, and benchmark updates. The Pulse feed highlights exactly what changed each week, including new models, price cuts, and benchmark refreshes. The platform is also updated daily, so you always have access to the most current information.

What types of tasks can I find models for on LLM Reference?

You can find models for a wide range of tasks including coding, RAG, agents, long context, vision, classification, JSON or tool use, writing, research, summarization, docs Q&A, translation, data and SQL, image generation, video creation, voice TTS, transcription, music, and image editing. The platform organizes models by audience type: developers, knowledge workers, and creatives.

How are the Editors' Picks determined?

Editors' Picks are curated based on a combination of benchmark performance, real-world testing, and expert analysis. Each pick includes an excellence rating and a detailed rationale explaining why that model is recommended for the specific task. The picks are regularly updated as new models and benchmarks become available.

Can I compare two models directly on the platform?

Yes, LLM Reference includes a dedicated model comparison tool that allows you to compare two models side-by-side. You can evaluate performance metrics, pricing, and capabilities. The cheat sheet section also provides quick answers to the most-asked comparisons, such as Claude Fable 5 versus Claude Opus 4.8 or GPT-5.5 versus Gemini 3.1 Pro Preview.