modelstop.top
Home/All Models

AI Model Catalogue

Browse 40 models across providers, modalities, and use cases.

👁️ Vision & Multimodal

40 models · Page 1 of 2

gemma-3-12b-it

google

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

textvisionreasoning
80,000 ctx$0.35/1M in
Explore specs and pricingView details →

Gemma 3 4B IT

google

Gemma 3 4B IT — available via AWS Bedrock (us-east-1).

textvisionmultimodal
131,072 ctxFree in
Explore specs and pricingView details →

Gemma 3 27B PT

google

Gemma 3 27B PT — available via AWS Bedrock (us-east-1).

textvisionmultimodal
131,072 ctxFree in
Explore specs and pricingView details →

Gemma 3 4b it

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionreasoning
65,536 ctxFree in
Explore specs and pricingView details →

upscaler

google

Upscale images 2x or 4x times

visionfree
Run locally
ctxFree in
Explore specs and pricingView details →

imagen-3-fast

google

A faster and cheaper Imagen 3 model, for when price or speed are more important than final image quality

visionfree
ctxFree in
Explore specs and pricingView details →

imagen-4

google

Google's Imagen 4 flagship model

visionfree
ctxFree in
Explore specs and pricingView details →

imagen-3

google

Google's highest quality text-to-image model, capable of generating images with detail, rich lighting and beauty

visionimagefree
ctxFree in
Explore specs and pricingView details →

imagen-4-fast

google

Use this fast version of Imagen 4 when speed and cost are more important than quality

visionfree
ctxFree in
Explore specs and pricingView details →

gemini-2.5-flash-image

google

Google's latest image generation model in Gemini 2.5

visionimagefree
ctxFree in
Explore specs and pricingView details →

nano-banana

google

Google's latest image editing model in Gemini 2.5

visionfree
ctxFree in
Explore specs and pricingView details →

nano-banana-pro

google

Google's state of the art image generation and editing model 🍌🍌

visionimagefree
ctxFree in
Explore specs and pricingView details →

nano-banana-2

google

Google's fast image generation model with conversational editing, multi-image fusion, and character consistency

visionimagefree
ctxFree in
Explore specs and pricingView details →

veo-3.1

google

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

visionaudiofree
ctxFree in
Explore specs and pricingView details →

lyria-3

google

Generate 30-second music clips from text prompts or images with Lyria 3, Google's music generation model

visionimagefree
ctxFree in
Explore specs and pricingView details →

imagen-4-ultra

google

Use this ultra version of Imagen 4 when quality matters more than speed and cost

visionfree
ctxFree in
Explore specs and pricingView details →

lyria-3-pro

google

Generate full-length songs up to 3 minutes from text prompts or images with Lyria 3 Pro, Google's most capable music generation model

visionimagefree
ctxFree in
Explore specs and pricingView details →

Google: Gemini 2.0 Flash

google

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It...

textvisionmultimodal
1,000,000 ctx$0.10/1M in
Explore specs and pricingView details →

Google: Gemini 2.0 Flash Lite

google

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5),...

textvisionmultimodal
1,048,576 ctx$0.07/1M in
Explore specs and pricingView details →

Google: Gemma 3 27B

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionmultimodal
Run locally
131,072 ctx$0.08/1M in
Explore specs and pricingView details →

Google: Gemma 3 12B

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionmultimodal
Run locally
131,072 ctx$0.04/1M in
Explore specs and pricingView details →

Google: Gemma 3 4B

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionmultimodal
Run locally
131,072 ctx$0.04/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro Preview 05-06

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
Run locally
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Google: Gemma 3n 4B

google

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

textvisionaudio
Run locally
32,768 ctx$0.02/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro Preview 06-05

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
Run locally
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
Run locally
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash

google

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

textvisionmultimodal
Run locally
1,048,576 ctx$0.30/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash Lite

google

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

textvisionimage
1,048,576 ctx$0.10/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash Lite Preview 09-2025

google

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

textvisionimage
1,048,576 ctx$0.10/1M in
Explore specs and pricingView details →

Google: Nano Banana (Gemini 2.5 Flash Image)

google

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation,...

textvisionimage
32,768 ctx$0.30/1M in
Explore specs and pricingView details →

Google: Nano Banana Pro (Gemini 3 Pro Image Preview)

google

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and...

textvisionimage
65,536 ctx$2.00/1M in
Explore specs and pricingView details →

Google: Gemini 3 Flash Preview

google

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...

textvisionmultimodal
1,048,576 ctx$0.50/1M in
Explore specs and pricingView details →

Google: Gemini 3.1 Pro Preview

google

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

textvisionmultimodal
1,048,576 ctx$2.00/1M in
Explore specs and pricingView details →

Google: Gemini 3.1 Pro Preview Custom Tools

google

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party...

textvisionmultimodal
1,048,576 ctx$2.00/1M in
Explore specs and pricingView details →

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

google

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...

textvisionimage
65,536 ctx$0.50/1M in
Explore specs and pricingView details →

Google: Gemini 3.1 Flash Lite Preview

google

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

textvisionmultimodal
1,048,576 ctx$0.25/1M in
Explore specs and pricingView details →