modelstop.top
Home/All Models

AI Model Catalogue

Browse 267 models across providers, modalities, and use cases.

🌐 All Models

267 models · Page 6 of 8

ByteDance: UI-TARS 7B

bytedance

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

textvisionmultimodal
128,000 ctx$0.10/1M in
Explore specs and pricingView details →

Anthropic: Claude Opus 4.1

anthropic

Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains...

textvisionmultimodal
200,000 ctx$15.00/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Nano

openai

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger...

textvisionmultimodal
400,000 ctx$0.05/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Mini

openai

GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost....

textvisionmultimodal
400,000 ctx$0.25/1M in
Explore specs and pricingView details →

OpenAI: GPT-5

openai

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy...

textvisionmultimodal
400,000 ctx$1.25/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Chat

openai

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.

textvisionmultimodal
128,000 ctx$1.25/1M in
Explore specs and pricingView details →

Z.ai: GLM 4.5V

z-ai

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding,...

textvisionmultimodal
65,536 ctx$0.60/1M in
Explore specs and pricingView details →

Baidu: ERNIE 4.5 VL 28B A3B

baidu

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing....

textvisionmultimodal
30,000 ctx$0.14/1M in
Explore specs and pricingView details →

Baidu: ERNIE 4.5 21B A3B

baidu

A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptional multimodal understanding and generation through heterogeneous MoE structures and modality-isolated routing. Supporting an...

textvisioncheap
120,000 ctx$0.07/1M in
Explore specs and pricingView details →

Mistral: Mistral Medium 3.1

mistralai

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances...

textvisionmultimodal
131,072 ctx$0.40/1M in
Explore specs and pricingView details →

xAI: Grok 4 Fast

x-ai

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read more about the model...

textvisionmultimodal
2,000,000 ctx$0.20/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Codex

openai

GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

textvisionmultimodal
400,000 ctx$1.25/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 235B A22B Instruct

qwen

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...

textvisionimage
262,144 ctx$0.20/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 235B A22B Thinking

qwen

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....

textvisionimage
131,072 ctx$0.26/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash Lite Preview 09-2025

google

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

textvisionimage
1,048,576 ctx$0.10/1M in
Explore specs and pricingView details →

Anthropic: Claude Sonnet 4.5

anthropic

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...

textvisionmultimodal
1,000,000 ctx$3.00/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Pro

openai

GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and...

textvisionmultimodal
400,000 ctx$15.00/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 30B A3B Instruct

qwen

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

textvisionimage
131,072 ctx$0.13/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 30B A3B Thinking

qwen

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...

textvisionimage
131,072 ctx$0.13/1M in
Explore specs and pricingView details →

Google: Nano Banana (Gemini 2.5 Flash Image)

google

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation,...

textvisionimage
32,768 ctx$0.30/1M in
Explore specs and pricingView details →

OpenAI: o4 Mini Deep Research

openai

o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for tackling complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost.

textvisionmultimodal
200,000 ctx$2.00/1M in
Explore specs and pricingView details →

OpenAI: o3 Deep Research

openai

o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost.

textvisionmultimodal
200,000 ctx$10.00/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Image

openai

[GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improvements in reasoning, code quality, and user experience while incorporating GPT Image 1's superior instruction following,...

textvisionimage
400,000 ctx$10.00/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 8B Instruct

qwen

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

textvisionmultimodal
131,072 ctx$0.08/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 8B Thinking

qwen

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...

textvisionmultimodal
131,072 ctx$0.12/1M in
Explore specs and pricingView details →

Anthropic: Claude Haiku 4.5

anthropic

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance...

textvisionmultimodal
200,000 ctx$1.00/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Image Mini

openai

GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini](https://openrouter.ai/openai/gpt-5-mini), with GPT Image 1 Mini for efficient image generation. This natively multimodal model features superior instruction following, text...

textvisionimage
400,000 ctx$2.50/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 32B Instruct

qwen

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

textvisionmultimodal
131,072 ctx$0.10/1M in
Explore specs and pricingView details →

NVIDIA: Nemotron Nano 12B 2 VL (free)

nvidia

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...

textvisionmultimodal
128,000 ctx$0.20/1M in
Explore specs and pricingView details →

Perplexity: Sonar Pro Search

perplexity

Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based...

textvisionmultimodal
200,000 ctx$3.00/1M in
Explore specs and pricingView details →

Amazon: Nova Premier 1.0

amazon

Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.

textvisionmultimodal
1,000,000 ctx$2.50/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.1-Codex-Mini

openai

GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex

textvisionmultimodal
400,000 ctx$0.25/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.1-Codex

openai

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

textvisionmultimodal
400,000 ctx$1.25/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.1 Chat

openai

GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on...

textvisionmultimodal
128,000 ctx$1.25/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.1

openai

GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning...

textvisionmultimodal
400,000 ctx$1.25/1M in
Explore specs and pricingView details →

xAI: Grok 4.1 Fast

x-ai

Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window. Reasoning can be enabled/disabled using...

textvisionmultimodal
2,000,000 ctx$0.20/1M in
Explore specs and pricingView details →