modelstop.top
Home/All Models

AI Model Catalogue

Browse 267 models across providers, modalities, and use cases.

👁️ Vision & Multimodal

267 models · Page 7 of 8

Google: Nano Banana Pro (Gemini 3 Pro Image Preview)

google

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and...

textvisionimage
65,536 ctx$2.00/1M in
Explore specs and pricingView details →

Anthropic: Claude Opus 4.5

anthropic

Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and...

textvisionmultimodal
200,000 ctx$5.00/1M in
Explore specs and pricingView details →

Mistral: Mistral Large 3 2512

mistralai

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.

textvisionmultimodal
262,144 ctx$0.50/1M in
Explore specs and pricingView details →

Mistral: Ministral 3 3B 2512

mistralai

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

textvisionmultimodal
131,072 ctx$0.10/1M in
Explore specs and pricingView details →

Mistral: Ministral 3 8B 2512

mistralai

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

textvisionmultimodal
262,144 ctx$0.15/1M in
Explore specs and pricingView details →

Mistral: Ministral 3 14B 2512

mistralai

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language...

textvisionmultimodal
262,144 ctx$0.20/1M in
Explore specs and pricingView details →

Amazon: Nova 2 Lite

amazon

Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing...

textvisionimage
1,000,000 ctx$0.30/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.1-Codex-Max

openai

GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development tasks. It is based on an updated version of the 5.1 reasoning stack and trained on agentic...

textvisionmultimodal
400,000 ctx$1.25/1M in
Explore specs and pricingView details →

Z.ai: GLM 4.6V

z-ai

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts...

textvisionmultimodal
131,072 ctx$0.30/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.2

openai

GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly...

textvisionmultimodal
400,000 ctx$1.75/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.2 Pro

openai

GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning,...

textvisionmultimodal
400,000 ctx$21.00/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.2 Chat

openai

GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on...

textvisionmultimodal
128,000 ctx$1.75/1M in
Explore specs and pricingView details →

Google: Gemini 3 Flash Preview

google

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...

textvisionmultimodal
1,048,576 ctx$0.50/1M in
Explore specs and pricingView details →

ByteDance Seed: Seed 1.6

bytedance-seed

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.

textvisionmultimodal
262,144 ctx$0.25/1M in
Explore specs and pricingView details →

ByteDance Seed: Seed 1.6 Flash

bytedance-seed

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of...

textvisionimage
262,144 ctx$0.07/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.2-Codex

openai

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

textvisionmultimodal
400,000 ctx$1.75/1M in
Explore specs and pricingView details →

MoonshotAI: Kimi K2.5

moonshotai

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed...

textvisionmultimodal
256,000 ctx$0.38/1M in
Explore specs and pricingView details →

Free Models Router

openrouter

The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that...

textvisionmultimodal
200,000 ctx$0.00/1M in
Explore specs and pricingView details →

Anthropic: Claude Opus 4.6

anthropic

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective...

textvisionmultimodal
1,000,000 ctx$5.00/1M in
Explore specs and pricingView details →

Qwen: Qwen3.5 397B A17B

qwen

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...

textvisionmultimodal
262,144 ctx$0.39/1M in
Explore specs and pricingView details →

Qwen: Qwen3.5 Plus 2026-02-15

qwen

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...

textvisionmultimodal
1,000,000 ctx$0.26/1M in
Explore specs and pricingView details →

Anthropic: Claude Sonnet 4.6

anthropic

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...

textvisionmultimodal
1,000,000 ctx$3.00/1M in
Explore specs and pricingView details →

Google: Gemini 3.1 Pro Preview

google

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

textvisionmultimodal
1,048,576 ctx$2.00/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.3-Codex

openai

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

textvisionmultimodal
400,000 ctx$1.75/1M in
Explore specs and pricingView details →

Google: Gemini 3.1 Pro Preview Custom Tools

google

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party...

textvisionmultimodal
1,048,576 ctx$2.00/1M in
Explore specs and pricingView details →

Qwen: Qwen3.5-Flash

qwen

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...

textvisionmultimodal
1,000,000 ctx$0.07/1M in
Explore specs and pricingView details →

Qwen: Qwen3.5-122B-A10B

qwen

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

textvisionmultimodal
262,144 ctx$0.26/1M in
Explore specs and pricingView details →

Qwen: Qwen3.5-27B

qwen

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

textvisionmultimodal
262,144 ctx$0.20/1M in
Explore specs and pricingView details →

Qwen: Qwen3.5-35B-A3B

qwen

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

textvisionmultimodal
262,144 ctx$0.16/1M in
Explore specs and pricingView details →

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

google

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...

textvisionimage
65,536 ctx$0.50/1M in
Explore specs and pricingView details →

ByteDance Seed: Seed-2.0-Mini

bytedance-seed

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding,...

textvisionmultimodal
262,144 ctx$0.10/1M in
Explore specs and pricingView details →

Google: Gemini 3.1 Flash Lite Preview

google

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

textvisionmultimodal
1,048,576 ctx$0.25/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.3 Chat

openai

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly...

textvisionmultimodal
128,000 ctx$1.75/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.4

openai

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

textvisionmultimodal
1,050,000 ctx$2.50/1M in
Explore specs and pricingView details →

OpenAI: GPT-5.4 Pro

openai

GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K...

textvisionmultimodal
1,050,000 ctx$30.00/1M in
Explore specs and pricingView details →

Qwen: Qwen3.5-9B

qwen

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

textvisionmultimodal
262,144 ctx$0.05/1M in
Explore specs and pricingView details →