modelstop.top
Home/All Models

AI Model Catalogue

Browse 141 models across providers, modalities, and use cases.

🌐 All Models

141 models Β· Page 1 of 4

Amazon Nova Pro

amazon

Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost across a wide range of tasks. Supports text, image, and video inputs.

visionmultimodallong-context
300,000 ctx$0.80/1M in
Explore specs and pricingView details β†’

Amazon Nova Lite

amazon

Amazon Nova Lite is a very low-cost multimodal model that can process image, video, and text inputs. Fast and accurate for a wide range of tasks requiring visual and language understanding.

visionmultimodalcheap
300,000 ctx$0.06/1M in
Explore specs and pricingView details β†’

Auto Router

openrouter

Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used,...

textvisionmultimodal
2,000,000 ctxFree in
Explore specs and pricingView details β†’

Anthropic: Claude 3 Haiku

anthropic

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal

textvisionmultimodal
200,000 ctx$0.25/1M in
Explore specs and pricingView details β†’

OpenAI: GPT-4 Turbo

openai

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.

textvisionmultimodal
128,000 ctx$10.00/1M in
Explore specs and pricingView details β†’

OpenAI: GPT-4o

openai

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...

textvisionmultimodal
128,000 ctx$2.50/1M in
Explore specs and pricingView details β†’

OpenAI: GPT-4o (2024-05-13)

openai

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...

textvisionmultimodal
128,000 ctx$5.00/1M in
Explore specs and pricingView details β†’

OpenAI: GPT-4o-mini

openai

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...

textvisionmultimodal
128,000 ctx$0.15/1M in
Explore specs and pricingView details β†’

OpenAI: GPT-4o-mini (2024-07-18)

openai

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...

textvisionmultimodal
128,000 ctx$0.15/1M in
Explore specs and pricingView details β†’

OpenAI: GPT-4o (2024-08-06)

openai

The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai.com/index/introducing-structured-outputs-in-the-api/). GPT-4o ("o" for "omni") is...

textvisionmultimodal
128,000 ctx$2.50/1M in
Explore specs and pricingView details β†’

Meta: Llama 3.2 11B Vision Instruct

meta-llama

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

textvisionmultimodal
131,072 ctx$0.24/1M in
Explore specs and pricingView details β†’

Anthropic: Claude 3.5 Haiku

anthropic

Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic...

textvisionmultimodal
200,000 ctx$0.80/1M in
Explore specs and pricingView details β†’

Mistral: Pixtral Large 2411

mistralai

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is...

textvisionmultimodal
131,072 ctx$2.00/1M in
Explore specs and pricingView details β†’

OpenAI: GPT-4o (2024-11-20)

openai

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded...

textvisionmultimodal
128,000 ctx$2.50/1M in
Explore specs and pricingView details β†’

Amazon: Nova Pro 1.0

amazon

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December...

textvisionmultimodal
300,000 ctx$0.80/1M in
Explore specs and pricingView details β†’

Amazon: Nova Lite 1.0

amazon

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...

textvisionimage
300,000 ctx$0.06/1M in
Explore specs and pricingView details β†’

OpenAI: o1

openai

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason...

textvisionmultimodal
200,000 ctx$15.00/1M in
Explore specs and pricingView details β†’

MiniMax: MiniMax-01

minimax

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context...

textvisionimage
1,000,192 ctx$0.20/1M in
Explore specs and pricingView details β†’

Perplexity: Sonar

perplexity

Sonar is lightweight, affordable, fast, and simple to use β€” now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features...

textvisionmultimodal
127,072 ctx$1.00/1M in
Explore specs and pricingView details β†’

Qwen: Qwen2.5 VL 72B Instruct

qwen

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

textvisionmultimodal
32,000 ctx$0.80/1M in
Explore specs and pricingView details β†’

Qwen: Qwen VL Max

qwen

Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks.

textvisionmultimodal
131,072 ctx$0.52/1M in
Explore specs and pricingView details β†’

Qwen: Qwen VL Plus

qwen

Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for...

textvisionmultimodal
131,072 ctx$0.14/1M in
Explore specs and pricingView details β†’

Google: Gemini 2.0 Flash

google

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It...

textvisionmultimodal
1,000,000 ctx$0.10/1M in
Explore specs and pricingView details β†’

Anthropic: Claude 3.7 Sonnet

anthropic

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and...

textvisionmultimodal
200,000 ctx$3.00/1M in
Explore specs and pricingView details β†’

Google: Gemini 2.0 Flash Lite

google

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5),...

textvisionmultimodal
1,048,576 ctx$0.07/1M in
Explore specs and pricingView details β†’

Perplexity: Sonar Pro

perplexity

Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth, multi-step queries with added extensibility, like...

textvisionmultimodal
200,000 ctx$3.00/1M in
Explore specs and pricingView details β†’

Perplexity: Sonar Reasoning Pro

perplexity

Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thought (CoT). Designed for...

textvisionmultimodal
128,000 ctx$2.00/1M in
Explore specs and pricingView details β†’

Google: Gemma 3 27B (free)

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionmultimodal
131,072 ctx$0.08/1M in
Explore specs and pricingView details β†’

Google: Gemma 3 12B (free)

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionmultimodal
32,768 ctx$0.04/1M in
Explore specs and pricingView details β†’

Google: Gemma 3 4B (free)

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionmultimodal
32,768 ctx$0.04/1M in
Explore specs and pricingView details β†’

Mistral: Mistral Small 3.1 24B

mistralai

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...

textvisionmultimodal
128,000 ctx$0.03/1M in
Explore specs and pricingView details β†’

OpenAI: o1-pro

openai

The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide...

textvisionmultimodal
200,000 ctx$150.00/1M in
Explore specs and pricingView details β†’

Qwen: Qwen2.5 VL 32B Instruct

qwen

Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual...

textvisionmultimodal
128,000 ctx$0.20/1M in
Explore specs and pricingView details β†’

Meta: Llama 4 Scout

meta-llama

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...

textvisionmultimodal
327,680 ctx$0.08/1M in
Explore specs and pricingView details β†’

Meta: Llama 4 Maverick

meta-llama

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...

textvisionmultimodal
1,048,576 ctx$0.15/1M in
Explore specs and pricingView details β†’

OpenAI: GPT-4.1 Nano

openai

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million...

textvisionmultimodal
1,047,576 ctx$0.10/1M in
Explore specs and pricingView details β†’