Home/All Models

AI Model Catalogue

Browse 141 models across providers, modalities, and use cases.

🌐All Models 💬Text Generation 💻Code & Reasoning 👁️Vision & Multimodal 🎨Image Generation 🎙️Audio & Speech 🤖Agents & Tools 📄Long Context 🆓Free & Open 🧠Reasoning 🌍Multilingual

Filter & Sort

Clear All

🌐 All Models

141 models · Page 4 of 4

Anthropic: Claude Opus 4.6

anthropic

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective...

textvisionmultimodal

1,000,000 ctx$5.00/1M in

Explore specs and pricingView details →

Qwen: Qwen3.5 397B A17B

qwen

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...

textvisionmultimodal

262,144 ctx$0.39/1M in

Explore specs and pricingView details →

Qwen: Qwen3.5 Plus 2026-02-15

qwen

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...

textvisionmultimodal

1,000,000 ctx$0.26/1M in

Explore specs and pricingView details →

Anthropic: Claude Sonnet 4.6

anthropic

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...

textvisionmultimodal

1,000,000 ctx$3.00/1M in

Explore specs and pricingView details →

Google: Gemini 3.1 Pro Preview

google

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

textvisionmultimodal

1,048,576 ctx$2.00/1M in

Explore specs and pricingView details →

OpenAI: GPT-5.3-Codex

openai

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

textvisionmultimodal

400,000 ctx$1.75/1M in

Explore specs and pricingView details →

Google: Gemini 3.1 Pro Preview Custom Tools

google

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party...

textvisionmultimodal

1,048,576 ctx$2.00/1M in

Explore specs and pricingView details →

Qwen: Qwen3.5-Flash

qwen

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...

textvisionmultimodal

1,000,000 ctx$0.07/1M in

Explore specs and pricingView details →

Qwen: Qwen3.5-122B-A10B

qwen

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

textvisionmultimodal

262,144 ctx$0.26/1M in

Explore specs and pricingView details →

Qwen: Qwen3.5-27B

qwen

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

textvisionmultimodal

262,144 ctx$0.20/1M in

Explore specs and pricingView details →

Qwen: Qwen3.5-35B-A3B

qwen

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

textvisionmultimodal

262,144 ctx$0.16/1M in

Explore specs and pricingView details →

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

google

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...

textvisionimage

65,536 ctx$0.50/1M in

Explore specs and pricingView details →

ByteDance Seed: Seed-2.0-Mini

bytedance-seed

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding,...

textvisionmultimodal

262,144 ctx$0.10/1M in

Explore specs and pricingView details →

Google: Gemini 3.1 Flash Lite Preview

google

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

textvisionmultimodal

1,048,576 ctx$0.25/1M in

Explore specs and pricingView details →

OpenAI: GPT-5.3 Chat

openai

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly...

textvisionmultimodal

128,000 ctx$1.75/1M in

Explore specs and pricingView details →

OpenAI: GPT-5.4

openai

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

textvisionmultimodal

1,050,000 ctx$2.50/1M in

Explore specs and pricingView details →

OpenAI: GPT-5.4 Pro

openai

GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K...

textvisionmultimodal

1,050,000 ctx$30.00/1M in

Explore specs and pricingView details →

Qwen: Qwen3.5-9B

qwen

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

textvisionmultimodal

262,144 ctx$0.05/1M in

Explore specs and pricingView details →

ByteDance Seed: Seed-2.0-Lite

bytedance-seed

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across...

textvisionmultimodal

262,144 ctx$0.25/1M in

Explore specs and pricingView details →

Mistral: Mistral Small 4

mistralai

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from...

textvisionmultimodal

262,144 ctx$0.15/1M in

Explore specs and pricingView details →

OpenAI: GPT-5.4 Mini

openai

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding,...

textvisionmultimodal

400,000 ctx$0.75/1M in

Explore specs and pricingView details →

OpenAI: GPT-5.4 Nano

openai

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency...

textvisionmultimodal

400,000 ctx$0.20/1M in

Explore specs and pricingView details →

Xiaomi: MiMo-V2-Omni

xiaomi

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

textvisionmultimodal

262,144 ctx$0.40/1M in

Explore specs and pricingView details →

Reka Edge

rekaai

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...

textvisionimage

16,384 ctx$0.10/1M in

Explore specs and pricingView details →

Google: Lyria 3 Clip Preview

google

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...

textvisionimage

1,048,576 ctx$0.00/1M in

Explore specs and pricingView details →

Google: Lyria 3 Pro Preview

google

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...

textvisionimage

1,048,576 ctx$0.00/1M in

Explore specs and pricingView details →

xAI: Grok 4.20

x-ai

Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently...

textvisionmultimodal

2,000,000 ctx$2.00/1M in

Explore specs and pricingView details →

xAI: Grok 4.20 Multi-Agent

x-ai

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information...

textvisionmultimodal

2,000,000 ctx$2.00/1M in

Explore specs and pricingView details →

Z.ai: GLM 5V Turbo

z-ai

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding,...

textvisionmultimodal

202,752 ctx$1.20/1M in

Explore specs and pricingView details →

Qwen: Qwen3.6 Plus

qwen

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...

textvisionmultimodal

1,000,000 ctx$0.33/1M in

Explore specs and pricingView details →

Google: Gemma 4 31B (free)

google

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

textvisionmultimodal

262,144 ctx$0.14/1M in

Explore specs and pricingView details →

Google: Gemma 4 26B A4B (free)

google

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

textvisionmultimodal

262,144 ctx$0.12/1M in

Explore specs and pricingView details →

Anthropic: Claude Opus 4.6 (Fast)

anthropic

Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

textvisionmultimodal

1,000,000 ctx$30.00/1M in

Explore specs and pricingView details →