modelstop.top
Home/All Models

AI Model Catalogue

Browse 449 models across providers, modalities, and use cases.

🌐 All Models

449 models · Page 10 of 13

Perplexity: Sonar

perplexity

Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features...

textvisionmultimodal
Run locally
127,072 ctx$1.00/1M in
Explore specs and pricingView details →

Qwen: Qwen2.5 VL 72B Instruct

qwen

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

textvisionmultimodal
32,000 ctx$0.80/1M in
Explore specs and pricingView details →

Qwen: Qwen VL Max

qwen

Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks.

textvisionmultimodal
131,072 ctx$0.52/1M in
Explore specs and pricingView details →

Qwen: Qwen VL Plus

qwen

Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for...

textvisionmultimodal
131,072 ctx$0.14/1M in
Explore specs and pricingView details →

Google: Gemini 2.0 Flash

google

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It...

textvisionmultimodal
1,000,000 ctx$0.10/1M in
Explore specs and pricingView details →

Anthropic: Claude 3.7 Sonnet

anthropic

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and...

textvisionmultimodal
200,000 ctx$3.00/1M in
Explore specs and pricingView details →

Google: Gemini 2.0 Flash Lite

google

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5),...

textvisionmultimodal
1,048,576 ctx$0.07/1M in
Explore specs and pricingView details →

Perplexity: Sonar Pro

perplexity

Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth, multi-step queries with added extensibility, like...

textvisionmultimodal
200,000 ctx$3.00/1M in
Explore specs and pricingView details →

Perplexity: Sonar Reasoning Pro

perplexity

Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thought (CoT). Designed for...

textvisionmultimodal
128,000 ctx$2.00/1M in
Explore specs and pricingView details →

Google: Gemma 3 27B

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionmultimodal
Run locally
131,072 ctx$0.08/1M in
Explore specs and pricingView details →

Google: Gemma 3 12B

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionmultimodal
Run locally
131,072 ctx$0.04/1M in
Explore specs and pricingView details →

Google: Gemma 3 4B

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionmultimodal
Run locally
131,072 ctx$0.04/1M in
Explore specs and pricingView details →

Mistral: Mistral Small 3.1 24B

mistralai

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...

textvisionmultimodal
128,000 ctx$0.03/1M in
Explore specs and pricingView details →

OpenAI: o1-pro

openai

The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide...

textvisionmultimodal
200,000 ctx$150.00/1M in
Explore specs and pricingView details →

Qwen: Qwen2.5 VL 32B Instruct

qwen

Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual...

textvisionmultimodal
128,000 ctx$0.20/1M in
Explore specs and pricingView details →

Meta: Llama 4 Scout

meta-llama

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...

textvisionmultimodal
327,680 ctx$0.08/1M in
Explore specs and pricingView details →

Meta: Llama 4 Maverick

meta-llama

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...

textvisionmultimodal
Run locally
1,048,576 ctx$0.15/1M in
Explore specs and pricingView details →

OpenAI: GPT-4.1 Nano

openai

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million...

textvisionmultimodal
Run locally
1,047,576 ctx$0.10/1M in
Explore specs and pricingView details →

OpenAI: GPT-4.1 Mini

openai

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard...

textvisionmultimodal
1,047,576 ctx$0.40/1M in
Explore specs and pricingView details →

OpenAI: GPT-4.1

openai

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and...

textvisionmultimodal
Run locally
1,047,576 ctx$2.00/1M in
Explore specs and pricingView details →

OpenAI: o4 Mini

openai

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning...

textvisionmultimodal
200,000 ctx$1.10/1M in
Explore specs and pricingView details →

OpenAI: o3

openai

o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following....

textvisionmultimodal
Run locally
200,000 ctx$2.00/1M in
Explore specs and pricingView details →

OpenAI: o4 Mini High

openai

OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effort set to high. OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining...

textvisionmultimodal
200,000 ctx$1.10/1M in
Explore specs and pricingView details →

Meta: Llama Guard 4 12B

meta-llama

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

textvisionmultimodal
163,840 ctx$0.18/1M in
Explore specs and pricingView details →

Arcee AI: Spotlight

arcee-ai

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal...

textvisionmultimodal
131,072 ctx$0.18/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro Preview 05-06

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
Run locally
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Mistral: Mistral Medium 3

mistralai

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost...

textvisionmultimodal
131,072 ctx$0.40/1M in
Explore specs and pricingView details →

Google: Gemma 3n 4B

google

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

textvisionaudio
Run locally
32,768 ctx$0.02/1M in
Explore specs and pricingView details →

Anthropic: Claude Sonnet 4

anthropic

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...

textvisionmultimodal
1,000,000 ctx$3.00/1M in
Explore specs and pricingView details →

Anthropic: Claude Opus 4

anthropic

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in...

textvisionmultimodal
Run locally
200,000 ctx$15.00/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro Preview 06-05

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
Run locally
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

OpenAI: o3 Pro

openai

The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently...

textvisionmultimodal
Run locally
200,000 ctx$20.00/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
Run locally
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash

google

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

textvisionmultimodal
Run locally
1,048,576 ctx$0.30/1M in
Explore specs and pricingView details →

Mistral: Mistral Small 3.2 24B

mistralai

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...

textvisionmultimodal
Run locally
128,000 ctx$0.07/1M in
Explore specs and pricingView details →

Baidu: ERNIE 4.5 VL 424B A47B

baidu

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data...

textvisionmultimodal
Run locally
131,072 ctx$0.42/1M in
Explore specs and pricingView details →