modelstop.top
Home/All Models

AI Model Catalogue

Browse 141 models across providers, modalities, and use cases.

🌐 All Models

141 models · Page 2 of 4

OpenAI: GPT-4.1 Mini

openai

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard...

textvisionmultimodal
1,047,576 ctx$0.40/1M in
Explore specs and pricingView details →

OpenAI: GPT-4.1

openai

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and...

textvisionmultimodal
1,047,576 ctx$2.00/1M in
Explore specs and pricingView details →

OpenAI: o4 Mini

openai

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning...

textvisionmultimodal
200,000 ctx$1.10/1M in
Explore specs and pricingView details →

OpenAI: o3

openai

o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following....

textvisionmultimodal
200,000 ctx$2.00/1M in
Explore specs and pricingView details →

OpenAI: o4 Mini High

openai

OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effort set to high. OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining...

textvisionmultimodal
200,000 ctx$1.10/1M in
Explore specs and pricingView details →

Meta: Llama Guard 4 12B

meta-llama

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

textvisionmultimodal
163,840 ctx$0.18/1M in
Explore specs and pricingView details →

Arcee AI: Spotlight

arcee-ai

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal...

textvisionmultimodal
131,072 ctx$0.18/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro Preview 05-06

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Mistral: Mistral Medium 3

mistralai

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost...

textvisionmultimodal
131,072 ctx$0.40/1M in
Explore specs and pricingView details →

Anthropic: Claude Sonnet 4

anthropic

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...

textvisionmultimodal
1,000,000 ctx$3.00/1M in
Explore specs and pricingView details →

Anthropic: Claude Opus 4

anthropic

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in...

textvisionmultimodal
200,000 ctx$15.00/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro Preview 06-05

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

OpenAI: o3 Pro

openai

The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently...

textvisionmultimodal
200,000 ctx$20.00/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash

google

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

textvisionmultimodal
1,048,576 ctx$0.30/1M in
Explore specs and pricingView details →

Mistral: Mistral Small 3.2 24B

mistralai

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...

textvisionmultimodal
128,000 ctx$0.07/1M in
Explore specs and pricingView details →

Baidu: ERNIE 4.5 VL 424B A47B

baidu

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data...

textvisionmultimodal
123,000 ctx$0.42/1M in
Explore specs and pricingView details →

xAI: Grok 4

x-ai

Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not...

textvisionmultimodal
256,000 ctx$3.00/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash Lite

google

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

textvisionimage
1,048,576 ctx$0.10/1M in
Explore specs and pricingView details →

ByteDance: UI-TARS 7B

bytedance

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

textvisionmultimodal
128,000 ctx$0.10/1M in
Explore specs and pricingView details →

Anthropic: Claude Opus 4.1

anthropic

Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains...

textvisionmultimodal
200,000 ctx$15.00/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Nano

openai

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger...

textvisionmultimodal
400,000 ctx$0.05/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Mini

openai

GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost....

textvisionmultimodal
400,000 ctx$0.25/1M in
Explore specs and pricingView details →

OpenAI: GPT-5

openai

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy...

textvisionmultimodal
400,000 ctx$1.25/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Chat

openai

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.

textvisionmultimodal
128,000 ctx$1.25/1M in
Explore specs and pricingView details →

Z.ai: GLM 4.5V

z-ai

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding,...

textvisionmultimodal
65,536 ctx$0.60/1M in
Explore specs and pricingView details →

Baidu: ERNIE 4.5 VL 28B A3B

baidu

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing....

textvisionmultimodal
30,000 ctx$0.14/1M in
Explore specs and pricingView details →

Mistral: Mistral Medium 3.1

mistralai

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances...

textvisionmultimodal
131,072 ctx$0.40/1M in
Explore specs and pricingView details →

xAI: Grok 4 Fast

x-ai

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read more about the model...

textvisionmultimodal
2,000,000 ctx$0.20/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Codex

openai

GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

textvisionmultimodal
400,000 ctx$1.25/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 235B A22B Instruct

qwen

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...

textvisionimage
262,144 ctx$0.20/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 235B A22B Thinking

qwen

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....

textvisionimage
131,072 ctx$0.26/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash Lite Preview 09-2025

google

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

textvisionimage
1,048,576 ctx$0.10/1M in
Explore specs and pricingView details →

Anthropic: Claude Sonnet 4.5

anthropic

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...

textvisionmultimodal
1,000,000 ctx$3.00/1M in
Explore specs and pricingView details →

OpenAI: GPT-5 Pro

openai

GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and...

textvisionmultimodal
400,000 ctx$15.00/1M in
Explore specs and pricingView details →

Qwen: Qwen3 VL 30B A3B Instruct

qwen

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

textvisionimage
131,072 ctx$0.13/1M in
Explore specs and pricingView details →