modelstop.top
Home/All Models

AI Model Catalogue

Browse 141 models across providers, modalities, and use cases.

๐ŸŒ All Models

141 models ยท Page 4 of 4

Anthropic: Claude Opus 4.6

anthropic

Opus 4.6 is Anthropicโ€™s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective...

textvisionmultimodal
1,000,000 ctx$5.00/1M in
Explore specs and pricingView details โ†’

Qwen: Qwen3.5 397B A17B

qwen

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...

textvisionmultimodal
262,144 ctx$0.39/1M in
Explore specs and pricingView details โ†’

Qwen: Qwen3.5 Plus 2026-02-15

qwen

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...

textvisionmultimodal
1,000,000 ctx$0.26/1M in
Explore specs and pricingView details โ†’

Anthropic: Claude Sonnet 4.6

anthropic

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...

textvisionmultimodal
1,000,000 ctx$3.00/1M in
Explore specs and pricingView details โ†’

Google: Gemini 3.1 Pro Preview

google

Gemini 3.1 Pro Preview is Googleโ€™s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

textvisionmultimodal
1,048,576 ctx$2.00/1M in
Explore specs and pricingView details โ†’

OpenAI: GPT-5.3-Codex

openai

GPT-5.3-Codex is OpenAIโ€™s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

textvisionmultimodal
400,000 ctx$1.75/1M in
Explore specs and pricingView details โ†’

Google: Gemini 3.1 Pro Preview Custom Tools

google

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party...

textvisionmultimodal
1,048,576 ctx$2.00/1M in
Explore specs and pricingView details โ†’

Qwen: Qwen3.5-Flash

qwen

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...

textvisionmultimodal
1,000,000 ctx$0.07/1M in
Explore specs and pricingView details โ†’

Qwen: Qwen3.5-122B-A10B

qwen

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

textvisionmultimodal
262,144 ctx$0.26/1M in
Explore specs and pricingView details โ†’

Qwen: Qwen3.5-27B

qwen

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

textvisionmultimodal
262,144 ctx$0.20/1M in
Explore specs and pricingView details โ†’

Qwen: Qwen3.5-35B-A3B

qwen

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

textvisionmultimodal
262,144 ctx$0.16/1M in
Explore specs and pricingView details โ†’

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

google

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Googleโ€™s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...

textvisionimage
65,536 ctx$0.50/1M in
Explore specs and pricingView details โ†’

ByteDance Seed: Seed-2.0-Mini

bytedance-seed

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding,...

textvisionmultimodal
262,144 ctx$0.10/1M in
Explore specs and pricingView details โ†’

Google: Gemini 3.1 Flash Lite Preview

google

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

textvisionmultimodal
1,048,576 ctx$0.25/1M in
Explore specs and pricingView details โ†’

OpenAI: GPT-5.3 Chat

openai

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly...

textvisionmultimodal
128,000 ctx$1.75/1M in
Explore specs and pricingView details โ†’

OpenAI: GPT-5.4

openai

GPT-5.4 is OpenAIโ€™s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

textvisionmultimodal
1,050,000 ctx$2.50/1M in
Explore specs and pricingView details โ†’

OpenAI: GPT-5.4 Pro

openai

GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K...

textvisionmultimodal
1,050,000 ctx$30.00/1M in
Explore specs and pricingView details โ†’

Qwen: Qwen3.5-9B

qwen

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

textvisionmultimodal
262,144 ctx$0.05/1M in
Explore specs and pricingView details โ†’

ByteDance Seed: Seed-2.0-Lite

bytedance-seed

Seed-2.0-Lite is a versatile, costโ€‘efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across...

textvisionmultimodal
262,144 ctx$0.25/1M in
Explore specs and pricingView details โ†’

Mistral: Mistral Small 4

mistralai

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from...

textvisionmultimodal
262,144 ctx$0.15/1M in
Explore specs and pricingView details โ†’

OpenAI: GPT-5.4 Mini

openai

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding,...

textvisionmultimodal
400,000 ctx$0.75/1M in
Explore specs and pricingView details โ†’

OpenAI: GPT-5.4 Nano

openai

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency...

textvisionmultimodal
400,000 ctx$0.20/1M in
Explore specs and pricingView details โ†’

Xiaomi: MiMo-V2-Omni

xiaomi

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

textvisionmultimodal
262,144 ctx$0.40/1M in
Explore specs and pricingView details โ†’

Reka Edge

rekaai

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...

textvisionimage
16,384 ctx$0.10/1M in
Explore specs and pricingView details โ†’

Google: Lyria 3 Clip Preview

google

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...

textvisionimage
1,048,576 ctx$0.00/1M in
Explore specs and pricingView details โ†’

Google: Lyria 3 Pro Preview

google

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...

textvisionimage
1,048,576 ctx$0.00/1M in
Explore specs and pricingView details โ†’

xAI: Grok 4.20

x-ai

Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently...

textvisionmultimodal
2,000,000 ctx$2.00/1M in
Explore specs and pricingView details โ†’

xAI: Grok 4.20 Multi-Agent

x-ai

Grok 4.20 Multi-Agent is a variant of xAIโ€™s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information...

textvisionmultimodal
2,000,000 ctx$2.00/1M in
Explore specs and pricingView details โ†’

Z.ai: GLM 5V Turbo

z-ai

GLM-5V-Turbo is Z.aiโ€™s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding,...

textvisionmultimodal
202,752 ctx$1.20/1M in
Explore specs and pricingView details โ†’

Qwen: Qwen3.6 Plus

qwen

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...

textvisionmultimodal
1,000,000 ctx$0.33/1M in
Explore specs and pricingView details โ†’

Google: Gemma 4 31B (free)

google

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

textvisionmultimodal
262,144 ctx$0.14/1M in
Explore specs and pricingView details โ†’

Google: Gemma 4 26B A4B (free)

google

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference โ€” delivering near-31B quality at...

textvisionmultimodal
262,144 ctx$0.12/1M in
Explore specs and pricingView details โ†’

Anthropic: Claude Opus 4.6 (Fast)

anthropic

Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

textvisionmultimodal
1,000,000 ctx$30.00/1M in
Explore specs and pricingView details โ†’