Access Global AI Models - Power Next-Gen Apps

From General to Specialized AI - All Models in One Platform

LLM Tools：

Model Comparison Cost Calculator Arena Open Source Models

Release Date

Input Price

Output Price

Filter

Service Provider

Classification

Capabilities

Context Length

326 models match the criteria

Release Date

Input Price

Output Price

Gemini 2.0 Flash Lite

Text generationMultilingualTool Call

Gemini 2.0 Flash Lite is the fastest model in the Gemini 2.0 series, optimized for higher cost-effectiveness and lower latency. It is designed to handle high-throughput lightweight tasks and supports multimodal inputs (such as images, documents, and audio), with a large input token limit

Text generationMultilingualTool Call

GPT-4.1 mini is a small and medium-sized multimodal model launched by OpenAI. It supports a context of one million tokens and can handle text, images, and videos. Its performance is comparable to that of GPT-4o. It scored 73% in the MMMU test, surpassing the previous generation. The latency is reduced by half, and the cost is reduced by 83%. It is suitable for developers to call through the API to handle long content and visual tasks.

Text generationMultilingual

Grok 4 Fast is a lightweight version of the large language model launched by xAI in 2025, focusing on high-speed inference and cost optimization. Its core features include: a token generation speed of 75 tokens per second (10 times faster than the standard version), a super-long context window of 2 million tokens, supporting the one-time processing of entire books or code libraries; the inference cost is reduced by 98%, and the consumption of inference tokens is reduced by 40% through optimized architecture. As the basic version of the Grok 4 series, it integrates text/image input, real-time web access (DeepSearch tool), and function call capabilities, targeting lightweight scenarios such as daily Q&A and document processing, and plans to gradually replace Grok 3 as the basic service for free users. While maintaining multimodal capabilities, this model is designed with efficiency as the priority to meet the needs of ordinary users.

Text GenerationMultilingualTool Call

OpenAI o3-mini is a small inference model launched in January 2025, targeting specialized solutions in the STEM field. It supports advanced developer features such as function calls and structured output, and provides low/medium/high inference levels to balance accuracy and efficiency. It is suitable for scenarios such as scientific computing and programming development, with the characteristics of low cost and low latency.

Text generationMultilingual

GPT-5 Codex is a multi-model hybrid code generation system launched by OpenAI, which integrates high - efficiency basic models and deep reasoning modules, and dynamically schedules resources through intelligent routing. Its code generation ability has been significantly improved, enabling rapid construction of complex front - end applications and debugging of large - scale code libraries. It supports generating complete websites and games with a single prompt and performs better in design aesthetics processing. It is suitable for programming development, application construction, and code debugging scenarios. Free users can use it for basic functions, while the paid version offers higher limits and extended reasoning capabilities.

Text generationMultilingualTool Call

Claude 3 Opus is a top - tier large - scale model launched by Anthropic. It belongs to the high - end version of the Claude 3 series and has multimodal capabilities, supporting a context window of 200,000 Tokens. It features a leading intelligence level, outperforming its peers in benchmark tests such as MMLU and GPQA. It can deeply understand complex tasks and achieve human - like interactions. It is suitable for scenarios such as task automation (API/database operations), R & D (drug R & D, research review), and strategic analysis (financial trend prediction, chart interpretation).

Gemini 2.0 Flash

Text GenerationMultilingualTool Call

Gemini 2.0 Flash is a multimodal AI model launched by Google and is a sub - model of the Gemini 2.0 suite. It has the capabilities of text understanding, image generation and editing, supports a context window of 1 million tokens, and its response speed is twice as fast as that of 1.5 Pro. It is suitable for scenarios such as advertising design, social media content creation, and educational illustration generation. Developers can access it through Google AI Studio and the Gemini API.

Claude Haiku 4.5

Text generationMultilingualTool Call

Claude Haiku 4.5 is a small hybrid inference AI language model launched by Anthropic. Its performance is close to that of the medium-sized model Sonnet 4, and its cost is only one-third of it, with the inference speed more than doubled. It has the ability to process a context of 200,000 tokens, supports multimodal prompts, and has an AI security level of ASL-2. It is suitable for real-time response scenarios such as intelligent customer service, programming assistance, and conversational assistants, and can be integrated through the Claude application, API, and major cloud platforms.

Gemini 2.5 Flash

Text GenerationMultilingualTool Call

Gemini 2.5 Flash is a lightweight multimodal AI model launched by Google. It supports text, image, audio, and video inputs, has adaptive inference capabilities, and improves token usage efficiency by 20 - 30%. It is suitable for high-throughput, low-latency tasks such as translation, classification, and multimodal interaction, and is open to developers and enterprise users.

Claude Sonnet 4.5

Text Generation

Claude Sonnet 4.5 is a mid - range balanced AI model released by Anthropic in September 2025. It belongs to the "medium - sized" product in the Claude series, positioning as a compromise between performance and cost. It has outstanding programming capabilities, with a score of 77.2% in the SWE - bench Verified test, supports continuous programming for over 30 hours, and can build production - level applications. It also has both efficient reasoning and visual processing capabilities, with a fast response speed and moderate cost, suitable for software development, complex intelligent agent construction, and enterprise - level tasks.

Claude 3 Sonnet

Text generationMultilingual

Claude 3 Sonnet is a large language model launched by Anthropic. It is a mid - range model in the Claude 3 series, balancing ability and speed, and is suitable for enterprise - level applications. It is twice as fast as its predecessor, has high controllability, supports content generation, classification, data extraction, knowledge retrieval, etc., and is available on the API and Amazon Bedrock.

Gemini 2.5 Flash Lite

Text generationMultilingualTool Call

Gemini 2.5 Flash - Lite is a lightweight AI inference model (preview version) launched by Google, featuring ultra - fast response and cost optimization. It is the fastest Gemini model currently. It supports multimodal input, a 1 - million - token context, and Google's native tools (such as search and code execution). It is suitable for high - throughput, low - latency scenarios (such as translation and classification) and provides API services for developers.

Qwen3 Vl 235b A22b Thinking

Visual UnderstandingTool Call

Qwen3-VL-235B-A22B-Thinking is the flagship visual - language model of Alibaba Tongyi Qianwen Qwen3 series, which adopts the MoE architecture and has 235 billion parameters. It has GUI - level visual agent capabilities, supports OCR in 32 languages, has a 256K context (extendable to 1M), excels in video understanding and multimodal reasoning, and is suitable for complex multimodal workflows, long - document retrieval, and intelligent interaction scenarios.

Qwen3 Coder Plus

Text generationTool Call

Qwen3-Coder-Plus is an enhanced code generation model in Alibaba's Tongyi Qianwen series. It belongs to the 480B parameter Mixture of Experts (MoE) architecture, with 35 billion active parameters and a 1M context window. It features strong code understanding and generation capabilities, supports multiple languages and complex logical reasoning, and its performance is comparable to Claude Sonnet. It is suitable for intelligent agent programming tasks such as large project analysis and code library operations.

Qianfan Lightning

Text GenerationChinese, EnglishTool Call

Qianfan-Lightning is a model series or service mode with high performance and ultra-low latency launched on Baidu Smart Cloud's Qianfan Large Model Platform.

Wan2.5 I2i Preview

Image generation

Wan2.5-i2i-preview is an image generation model that supports image editing functions. It is part of Alibaba Cloud's image generation service and is suitable for image creation and editing scenarios.

Text generationTool Call

Qwen3-Max is the most advanced large model in Alibaba's Qwen3 series. It has trillions of parameters, is pre - trained on 36T tokens, supports a context of over 260,000 tokens, covers multiple languages, and has an explicit reasoning mode. It is suitable for complex tasks such as enterprise - level policy Q&A, code review, and data analysis.

Qwen Image Plus

Image generation

Qwen-image-plus is an image generation model in the Tongyi Qianwen series of Alibaba Cloud. It is a professional version of Qwen-Image, excelling in complex text rendering and supporting both Chinese and English, as well as multi-line layouts. It is suitable for scenarios requiring precise text generation, such as posters and couplets. It has a lower cost compared to the basic version and can be called through an API, balancing quality and efficiency.

Visual understandingTool Call

Qwen3-VL-plus is an enhanced version of the visual language model launched by Tongyi Qianwen of Alibaba. It belongs to the Qwen3-VL series and offers Instruct and Thinking versions. It features high performance with a small number of parameters. The performance of the 8B parameters is approaching that of the previous generation's 72B flagship model. It supports images with a resolution of over one million pixels and enhances detailed recognition, text understanding, and complex visual reasoning. It is suitable for scenarios such as intelligent customer service, image recognition, content creation, and decision-making assistance.

Qwen Image Edit

Image generationMultilingual

Qwen-Image-Edit is an open-source Omni product-level diffusion model by Alibaba, built on the 20 billion parameter Qwen-Image. It supports both semantic and appearance editing. Features include precise Chinese and English text editing (while retaining font styles) and SOTA benchmark performance. It can be used for image content generation, combined text and image output, and multimodal assistant applications.

Doubao Seed Translation

Text generation

Doubao-Seed-Translation is a large multilingual translation model launched by ByteDance's Volcengine. Based on the Transformer architecture, it supports mutual translation among 28 languages. It has high accuracy (BLEU score of 42.5) and fluency, and is suitable for general text translation scenarios such as cross-border e-commerce, international cooperation, and education and learning.

Qwen3 Livetranslate Flaltimeash Re 2025 09 22

Speech RecognitionMultilingual

Qwen3-LiveTranslate-Flash is a multilingual real-time audio and video simultaneous interpretation model launched by Tongyi Qianwen of Alibaba. It is based on the Qwen3-Omni foundation and trained by fusing multimodal data. It supports offline/real-time translation of 18 languages and dialects with a low latency of 3 seconds. The visual enhancement technology improves the accuracy in complex scenarios and outperforms mainstream models. It is suitable for scenarios such as international conferences, remote teaching, and cross-border collaboration.

Qwen3 Next 80B A3B Instruct

Text generationMultilingual

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned large model launched by the Tongyi team at Alibaba Cloud in September 2025. It is based on a highly sparse MoE architecture, with a total of 80 billion parameters but only 3 billion activated. It uses a hybrid attention mechanism and multi-token prediction. The training cost is 1/10 of that of Qwen3-32B, and the inference throughput for 32k context is increased by 10 times. It natively supports a context of 262K tokens and can be extrapolated to process text in the millions. It is suitable for long-context scenarios such as long document understanding and legal analysis. It has been open-sourced and supports deployment on mainstream frameworks.

Wan2.5 T2i Preview

Image generation

wan2.5-t2i-preview is a text-to-image model in Alibaba Tongyi Wanxiang series, belonging to the multi-modal generation model. It features support for realistic scenes and photographic styles, and emphasizes the balance between image quality and response speed. It is suitable for general realistic scenes and photographic style image generation, and can be applied in fields such as advertising and e-commerce.

Wan2.5 I2v Preview

Video generation

wan2.5-i2v-preview is an image-to-video model in the Tongyi Wanxiang 2.5 series of Alibaba. It belongs to the multi-modal generation model. It uses a unified framework to integrate the generation capabilities of text, images, videos, and audio. It supports 1080P high-definition video output, can achieve audio-visual synchronization, can understand camera movement language, maintain the consistency of element IDs, support audio-driven video generation, and is suitable for content creation in fields such as advertising, e-commerce, film and television, and education.

Wan2.5 T2v Preview

Video generation

Wan2.5-t2v-preview is a multi-modal generation model launched by Tongyi of Alibaba. It integrates text-to/video, image-to-video, text-to-image, and image editing functions, supports 1080P/24fps output, achieves audio-visual synchronization, and can generate matching voices, sound effects, and background music. It has features such as camera movement control and element consistency optimization, and is applied in fields such as advertising, film and television, and education.

Qwen3 Omni 30b A3b Captioner

Speech Recognition

Qwen3-Omni-30B-A3B-Captioner is an open-source audio fine captioning model from Alibaba, fine-tuned from Instruct. It takes audio as input and outputs text. Its feature is detailed and low-hallucination audio descriptions, suitable for scenarios such as audio-visual content analysis, accessibility services, and intelligent editing.

Qwen3 Omni Flash Realtime

Full modalityMultilingual

Qwen3-omni-flash-realtime is a real-time full-modal AI model launched by Tongyi Qianwen of Alibaba. It supports multimodal processing of text, images, audio, and video, and has real-time interaction capabilities such as streaming conversations and mid-way interruption. It can be applied to scenarios such as voice assistants, multimedia analysis, and intelligent editing, and supports 119 text languages and 20 voice interactions.

Qwen3 Tts Flash

Text-to-Speech SynthesisMultilingual

Qwen3-TTS-Flash is a text-to-speech model launched by Tongyi of Alibaba. It supports 10 languages, 17 voice timbres, and 9 Chinese dialects. It can intelligently adjust the tone, with a first-packet delay of 97ms. It is suitable for scenarios such as intelligent customer service, audio creation, and voice assistants.

Qwen3 Tts Flash Realtime

Speech synthesisMultilingual

Qwen3-TTS-Flash-Realtime is a real-time text-to-speech model launched by Tongyi of Alibaba. The first packet delay is 97ms. It supports 17 timbres, 10 languages, and 17 dialects. The speech is natural and fluent. It is suitable for scenarios such as intelligent customer service, audiobooks, AI teachers, and film and television dubbing.

AIBase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2026AIBase