LLM Reference
LLM Reference helps tech leaders quickly find and compare the best AI models and providers for their specific project needs.
Visit
About LLM Reference
LLM Reference is a decision-support directory built specifically for engineers and technology leaders who need to select the right large language model (LLM) and provider in today's rapidly evolving AI landscape. The platform tracks over 1,800 language models from more than 140 providers and 247 research labs, with data refreshed weekly to include new releases, verified price changes, and benchmark updates. Its core value proposition is simple: stop wasting time hunting through scattered sources and start shipping with confidence. Whether you are building a coding assistant, an agentic workflow, a writing tool, or a research pipeline, LLM Reference gives you a single, trustworthy place to compare models side-by-side, see who offers the cheapest pricing for frontier output, and browse curated editors' picks for specific tasks like coding, agents, writing, research, image generation, and video creation. The site is designed for fast triage, enabling users to quickly identify the right model for their job, determine the most cost-effective provider, and get back to building. With a Pulse feed that highlights weekly changes including new models, price cuts, and benchmark refreshes, LLM Reference keeps you informed without the noise. It is built by the Data Advantage project and updated daily, making it an essential resource for anyone who needs to stay current with the exploding LLM ecosystem. The platform also offers comparison tools, a model directory, provider listings, benchmark boards, and a changelog to track historical updates.
Features of LLM Reference
Comprehensive Model Directory
LLM Reference maintains an extensive directory of over 1,800 language models from more than 140 providers and 247 research labs. Users can search the directory by task type, provider, or model name, and filter results based on specific criteria such as coding ability, RAG performance, agent capabilities, long context handling, vision tasks, classification, and JSON or tool use. This feature ensures you can quickly narrow down the field to models that match your exact requirements.
Curated Editors' Picks
The platform features editors' picks for six key task categories: coding, agents, writing, research, image generation, and video creation. Each pick includes a detailed rationale explaining why a particular model excels, along with benchmark scores and provider information. For example, Claude Fable 5 is recommended for coding with an 80.3% SWE-bench Pro score, while Veo 3.1 is the top video pick for its 30-second clips, native audio, and 4K output. These curated selections save teams hours of research time.
Live Pricing and Benchmark Data
LLM Reference provides real-time tracking of model pricing and benchmark performance. The platform highlights the cheapest frontier output, currently at $0.260 per 1M tokens for Hunyuan HY3 Preview via Tencent Cloud TI Platform. It also tracks weekly changes including new models, verified price cuts, and benchmark refreshes. Users can compare models side-by-side on cost and performance metrics to make informed decisions that balance quality with budget constraints.
Pulse Feed and Weekly Updates
The Pulse feed delivers a weekly summary of what changed in the model market, including new model releases, price reductions, and benchmark score updates. Each week, the platform reports metrics such as 177 new models, 53 price cuts, and 368 benchmark refreshes. This feature keeps users informed about the latest developments without requiring them to manually track multiple news sources or provider announcements.
Use Cases of LLM Reference
Selecting the Best Model for a Coding Assistant
An engineering team building an AI-powered coding assistant needs a model that excels at code generation, debugging, and understanding complex programming tasks. Using LLM Reference, they can navigate to the Coding board under Developers, review editors' picks like Claude Fable 5 which scores 80.3% on SWE-bench Pro, and compare it against alternatives such as Claude Opus 4.8 or GPT-5.5. The platform provides benchmark scores and provider pricing so the team can choose the most capable and cost-effective model for their specific workflow.
Optimizing Costs for High-Volume API Calls
A startup running thousands of API calls per day for a summarization service needs to minimize costs without sacrificing output quality. LLM Reference allows them to check the cheapest frontier output pricing, currently $0.260 per 1M tokens for Hunyuan HY3 Preview. They can also compare pricing across multiple providers for similar models, review the Cheat Sheet for common comparisons like Claude Opus 4.8 vs GPT-5.5, and use the Provider directory to find the most economical option for their volume.
Choosing a Model for Agentic Workflows
A developer building an autonomous agent that performs multi-step tasks needs a model with strong tool use and self-correction abilities. LLM Reference's Agents board shows Claude Sonnet 4.6 as the best generally-available pick with a tau-bench score of 87.5, noted for staying on-task across long tool loops and self-correcting without prompting. The developer can also explore other agent picks like Claude Fable 5 or GLM-5, and compare their benchmark scores and pricing side-by-side.
Staying Updated on Model Releases for Research
A research team tracking the latest advancements in open-weight models needs to know about new releases and benchmark updates weekly. LLM Reference's Pulse feed provides a concise summary of all changes, including new models like DiffusionGemma 26B A4B IT and North Mini Code 1.0, along with 368 benchmark refreshes. The team can also browse the Changelog for historical data and use the Search index to find specific models or providers relevant to their research focus.
Frequently Asked Questions
How often is LLM Reference updated?
LLM Reference is updated daily, with data refreshed weekly to include new model releases, verified price changes, and benchmark updates. The Pulse feed provides a weekly summary of all changes, including the number of new models, price cuts, and benchmark refreshes. This ensures users always have access to the most current information in the fast-moving AI landscape.
What types of models are tracked on the platform?
The platform tracks over 1,800 language models from more than 140 providers and 247 research labs. This includes both proprietary and open-weight models across categories such as coding, RAG, agents, long context, vision, classification, JSON or tool use, image generation, video creation, voice, transcription, and music. The directory covers frontier models from major labs as well as specialized models from smaller providers.
How are editors' picks determined?
Editors' picks are curated by the LLM Reference team based on a combination of benchmark performance, real-world task suitability, pricing, and availability. Each pick includes a detailed rationale citing specific benchmark scores, such as SWE-bench Pro for coding or Chatbot Arena ELO for writing. The picks are reviewed regularly and updated as new models and benchmark data become available, with a freshness indicator showing when each pick was last researched.
Can I compare two specific models directly?
Yes, LLM Reference includes a dedicated Compare feature that allows users to select any two models and view a side-by-side comparison of their specifications, benchmark scores, pricing, and provider information. This is particularly useful for evaluating trade-offs between models like Claude Fable 5 vs Claude Opus 4.8 or GPT-5.5 vs Gemini 3.1 Pro Preview. The Cheat Sheet section also highlights the most frequently requested comparisons for quick reference.
Similar to LLM Reference
Oravaa
Oravaa is an enterprise-grade Voice AI platform that actively handles your most repetitive phone operations. Instead of frustrating callers with rigid
Receptri
Receptri is your AI receptionist that answers calls and chats 24/7, learns your business, and manages bookings effortlessly.
Avatai
Avatai enables the creation of AI avatars that inform, engage, and respond to users, enhancing interactions across various applications.
Personal Agent
Ego is your personal AI agent that learns from you to execute complex tasks across your browser and devices.
Prompt Builder
Prompt Builder simplifies AI prompt creation, allowing you to generate, refine, and manage optimized prompts for any model in seconds.