LLM Reference

LLM Reference helps tech leaders quickly find and compare the best AI models and providers for their specific project needs.

AI Assistants Free

Visit LLM Reference

tool Details

Published May 29, 2026

Explore More

Best AI Assistants tools

Alternatives

View Alternatives

LLM Reference application interface and features

About LLM Reference

LLM Reference is a comprehensive decision-support directory engineered for technical professionals and organizational leaders who must navigate the rapidly expanding landscape of large language models. The platform tracks over 1,800 language models from more than 140 providers and 247 research labs, refreshing its data weekly to incorporate new releases, verified price modifications, and benchmark updates. The core value proposition is straightforward: eliminate the inefficiency of gathering scattered information from multiple sources and empower users to ship production systems with confidence. Whether you are building a coding assistant, an agentic workflow, a writing tool, or a research pipeline, LLM Reference provides a single, authoritative destination for comparing models side-by-side, identifying the cheapest pricing for frontier output, and browsing curated editors picks for specific tasks including coding, agents, writing, research, image generation, and video creation. The site is architected for rapid triage, allowing users to quickly identify the appropriate model for their job, determine the most cost-effective provider, and return to building. A Pulse feed highlights weekly changes including new models, price cuts, and benchmark refreshes, keeping users informed without overwhelming noise. Built by the Data Advantage project and updated daily, LLM Reference is an essential resource for anyone who needs to stay current with the exploding LLM ecosystem.

Features

Comprehensive Model Directory

LLM Reference maintains an exhaustive directory of over 1,800 language models from more than 140 providers and 247 research labs. This directory is updated weekly to include new releases, verified price changes, and benchmark updates. Users can search the entire catalog using keywords or filter by task categories such as coding, RAG, agents, long context, vision, classification, and JSON or tool use. The directory provides detailed model specifications, performance metrics, and pricing information, enabling engineers and technology leaders to make informed decisions without visiting multiple websites or tracking down fragmented data sources.

Curated Editors Picks and Leaderboards

The platform features curated editors picks organized by audience and task category, including developers, knowledge workers, and creatives. Each pick is rated on a quality-to-price scale and includes detailed reasoning for the recommendation. For example, the coding category highlights Claude Fable 5 with specific benchmark scores like 80.3 percent SWE-bench Pro and 96 percent SWE-bench Verified. Additionally, LLM Reference maintains 18 leaderboards across three audience segments: developers, knowledge workers, and creatives. Each segment has six boards covering tasks like coding, agents, writing, research, image generation, and video creation, making it easy to identify the top performer for any specific use case.

Pulse Feed and Weekly Change Tracking

The Pulse feed provides a real-time summary of what changed in the model market each week. This includes new model releases, verified provider price reductions, and benchmark refreshes. The current Pulse shows 177 new models, 53 price cuts, and 368 benchmark refreshes, with a highlighted frontier output price of $0.260 per 1 million tokens for the cheapest option. This feature keeps users informed about the rapidly evolving landscape without requiring them to manually monitor dozens of sources. The Pulse feed is designed for fast consumption, allowing users to quickly assess whether any recent changes affect their current model choices or provider relationships.

Side-by-Side Model Comparison

LLM Reference includes a dedicated comparison tool that allows users to evaluate two models against each other directly. The platform provides cheat sheet comparisons for the most-asked questions, such as Claude Fable 5 versus Claude Opus 4.8, or GPT-5.5 versus Gemini 3.1 Pro Preview. Users can compare models across multiple dimensions including benchmark performance, pricing, context window size, and supported modalities. This feature is critical for engineers who need to make trade-offs between performance and cost, or between different providers for the same task category. The comparison interface is designed for fast, data-driven decision making.

Use Cases

Selecting the Optimal Model for a Coding Assistant

Engineering teams building coding assistants can use LLM Reference to identify the best performing model for their specific requirements. The platform tracks SWE-bench scores across multiple versions and providers, allowing teams to compare models like Claude Fable 5, Claude Opus 4.8, and GPT-5.5 on verified benchmarks. The editors picks section provides detailed reasoning for each recommendation, including specific benchmark percentages and practical considerations. Teams can also filter by pricing to find the most cost-effective option that meets their performance thresholds, ensuring they ship production systems with the right balance of capability and expense.

Determining the Cheapest Frontier Provider for Production Deployments

Organizations deploying LLMs at scale need to minimize operational costs without sacrificing output quality. LLM Reference tracks verified price changes across all major providers and highlights the cheapest frontier output pricing, which is currently $0.260 per 1 million tokens from Hunyuan HY3 Preview via Tencent Cloud TI Platform. The platform updates pricing information weekly, ensuring that users always have access to the most current rates. By using the directory, procurement teams can compare prices across providers, identify recent price cuts, and negotiate better terms based on market intelligence.

Benchmarking Models for Research and Knowledge Work

Researchers and knowledge workers can leverage LLM Reference to identify models that excel in tasks like summarization, document question answering, translation, and data analysis. The platform provides leaderboards for each of these categories, with top picks like Claude Fable 5 for research (GDPval-AA ELO 1932) and Gemini 3 Flash for summarization. Users can drill down into specific benchmark scores and compare how different models perform across multiple evaluation suites. This capability is essential for academic researchers, analysts, and content teams who need to select the most capable model for their specific knowledge work pipeline.

Evaluating Models for Creative Tasks Including Image and Video Generation

Creative professionals can use LLM Reference to find the best models for image generation, video creation, voice synthesis, transcription, music composition, and image editing. The platform highlights top performers in each category, such as FLUX.2 Dev for photorealistic image generation and Veo 3.1 for video quality with 30-second clips, native audio, and up to 4K resolution through Vertex AI. Each editors pick includes detailed reasoning about why the model excels, including specific capabilities like brand consistency, text rendering, and hand generation. This allows creative teams to quickly identify the right tool for their specific project requirements without testing dozens of models.

Frequently Asked Questions

How often is the model directory updated?

The LLM Reference model directory is updated on a weekly cycle, with data refreshed to include new model releases, verified price changes, and benchmark updates. The platform is built by the Data Advantage project and updated daily, meaning that users can expect fresh information every time they visit. The Pulse feed specifically highlights what changed in the current week, including the number of new models added, price cuts verified, and benchmark refreshes completed. This weekly cadence ensures that users have access to the most current information without overwhelming them with constant notifications.

What types of models and providers are tracked?

LLM Reference currently tracks 1,843 language models from 140 providers and 247 research labs. The catalog includes models from major commercial providers like Anthropic, OpenAI, Google, and Tencent, as well as open-weight models from research labs and community contributors. Models are categorized by their primary use cases, including coding, RAG, agents, long context, vision, classification, and JSON or tool use. The platform also covers models for creative tasks like image generation, video creation, voice synthesis, transcription, music composition, and image editing. This comprehensive coverage ensures that users can find information about virtually any significant model in the ecosystem.

How are editors picks determined?

Editors picks are curated by the LLM Reference team based on a combination of benchmark performance, practical usability, pricing, and real-world testing. Each pick includes a quality rating and detailed reasoning, such as specific benchmark scores like SWE-bench percentages or Chatbot Arena ELO ratings. The picks are organized by audience segment (developers, knowledge workers, creatives) and task category, with multiple options provided for each use case. The platform also indicates how many total picks and eligible models exist for each category, giving users context about the breadth of options available. Picks are reviewed and updated regularly as new models and benchmarks become available.

Can I compare models side-by-side on the platform?

Yes, LLM Reference includes a dedicated comparison tool that allows users to evaluate two models directly against each other. The platform also provides cheat sheet comparisons for the most frequently asked questions, such as Claude Fable 5 versus Claude Opus 4.8 or GPT-5.5 versus Gemini 3.1 Pro Preview. Users can compare models across multiple dimensions including benchmark performance, pricing per million tokens, context window size, and supported features. The comparison interface is designed for fast, data-driven decision making, enabling engineers and technology leaders to quickly identify the best model for their specific requirements without manually compiling data from multiple sources.