Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

textvisionreasoning

262,144 ctx$0.18/1M in

Explore specs and pricingView details →

Gemma 3 4b it

google

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

textvisionreasoning

65,536 ctxFree in

Explore specs and pricingView details →

Qwen3-VL-32B-Instruct

qwen

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

textvisionreasoning

262,144 ctx$0.50/1M in

Explore specs and pricingView details →

Llama Guard 4 12B

meta-llama

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

textvisioncheap

1,048,576 ctx$0.20/1M in

Explore specs and pricingView details →

meta-llama/Llama-Guard-4-12B

deepinfra

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

textvisioncheap

163,840 ctxFree in

Explore specs and pricingView details →

meta-llama/Llama-3.2-11B-Vision-Instruct

deepinfra

textvisioninstruct

ctxFree in

Explore specs and pricingView details →

google/gemma-3-4b-it

deepinfra

textvisionreasoning

32,768 ctxFree in

Explore specs and pricingView details →

google/gemma-3-12b-it

deepinfra

textvisionreasoning

32,768 ctxFree in

Explore specs and pricingView details →

Qwen/Qwen3-VL-235B-A22B-Instruct

deepinfra

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...

textvisioninstruct

262,144 ctxFree in

Explore specs and pricingView details →

embed-v4.0

cohere

Cohere's latest multimodal embedding model supporting text and images for advanced semantic search.

textvisionfree

8,192 ctxFree in

Explore specs and pricingView details →

Qwen/Qwen3-VL-30B-A3B-Instruct

deepinfra

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

textvisioninstruct

131,072 ctxFree in

Explore specs and pricingView details →

google/gemma-4-31B-it

deepinfra

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

textvisionreasoning

262,144 ctxFree in

Explore specs and pricingView details →

google/gemma-3-27b-it

deepinfra

textvisionreasoning

131,072 ctxFree in

Explore specs and pricingView details →

c4ai-aya-vision-32b

cohere

textvisionfree

16,384 ctxFree in

Explore specs and pricingView details →

command-a-vision-07-2025

cohere

textvisionfree

128,000 ctxFree in

Explore specs and pricingView details →

mistral-medium-2505

mistralai

Our frontier-class multimodal model released May 2025.

textvisionfree

131,072 ctxFree in

Explore specs and pricingView details →

increase-resolution

bria

Bria Increase resolution upscales the resolution of any image. It increases resolution using a dedicated upscaling method that preserves the original image content without regeneration.

visionimagefree

ctxFree in

Explore specs and pricingView details →

image-colorization

topazlabs

Image colorization model from Topaz Labs

visionfree

ctxFree in

Explore specs and pricingView details →

generate-background

bria

Bria Background Generation allows for efficient swapping of backgrounds in images via text prompts or reference image, delivering realistic and polished results. Trained exclusively on licensed data for safe and risk-free commercial use

visionimagefree

ctxFree in

Explore specs and pricingView details →

firered-image-edit

prunaai

FireRed-Image-Edit is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.

visionfree

ctxFree in

Explore specs and pricingView details →

imagen-3

google

Google's highest quality text-to-image model, capable of generating images with detail, rich lighting and beauty

visionimagefree

ctxFree in

Explore specs and pricingView details →

eraser

bria

SOTA Object removal, enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use

visionfree

ctxFree in

Explore specs and pricingView details →

nano-banana

google

Google's latest image editing model in Gemini 2.5

visionfree

ctxFree in

Explore specs and pricingView details →

riverflow-2.0-pro

sourceful

Agentic image model optimized for robust, high-precision generations supporting font control

visionimageagents

ctxFree in

Explore specs and pricingView details →

fibo

bria

SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.

visionimageagents

ctxFree in

Explore specs and pricingView details →

dreamactor-m2.0

bytedance

Animate any character, humans, cartoons, animals, even non-humans, from a single image + driving video

visionfree

ctxFree in

Explore specs and pricingView details →

p-image-edit-lora

prunaai

Use trained LoRAs from the https://replicate.com/prunaai/p-image-edit-trainer. Find or contribute LoRAs here: https://huggingface.co/collections/PrunaAI/p-image-edit-loras.

visionfree

ctxFree in

Explore specs and pricingView details →

imagen-4-fast

google

Use this fast version of Imagen 4 when speed and cost are more important than quality

visionfree

ctxFree in

Explore specs and pricingView details →

fabric-1.0

veed

VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video

visionfree

ctxFree in

Explore specs and pricingView details →

image-3.2

bria

Commercial-ready, trained entirely on licensed data, text-to-image model. With only 4B parameters provides exceptional aesthetics and text rendering. Evaluated to be on par to other leading models in the market

visionimagefree

ctxFree in

Explore specs and pricingView details →