๐ All Models
449 models ยท Page 6 of 13
playground-v2.5-1024px-aesthetic
Open-source playground-v2.5-1024px-aesthetic model from playgroundai โ available for download and self-hosting on Hugging Face.
animagine-xl-4.0
Open-source animagine-xl-4.0 model from cagliostrolab โ available for download and self-hosting on Hugging Face.
nim/meta/llama-3.2-90b-vision-instruct
nim/meta/llama-3.2-11b-vision-instruct
Qwen3-VL-32B-Instruct
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
Gemma 3 4b it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
Llama Guard 4 12B
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...
Qwen3-VL-8B-Instruct
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
google/gemma-3-12b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
meta-llama/Llama-3.2-11B-Vision-Instruct
meta-llama/Llama-Guard-4-12B
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...
google/gemma-3-4b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
google/gemma-4-31B-it
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...
google/gemma-3-27b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
Qwen/Qwen3-VL-235B-A22B-Instruct
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...
embed-v4.0
Cohere's latest multimodal embedding model supporting text and images for advanced semantic search.
Qwen/Qwen3-VL-30B-A3B-Instruct
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...
command-a-vision-07-2025
c4ai-aya-vision-32b
mistral-medium-2505
Our frontier-class multimodal model released May 2025.
upscaler
Upscale images 2x or 4x times
gemini-2.5-flash-image
Google's latest image generation model in Gemini 2.5
nano-banana
Google's latest image editing model in Gemini 2.5
generate-background
Bria Background Generation allows for efficient swapping of backgrounds in images via text prompts or reference image, delivering realistic and polished results. Trained exclusively on licensed data for safe and risk-free commercial use
eraser
SOTA Object removal, enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use
increase-resolution
Bria Increase resolution upscales the resolution of any image. It increases resolution using a dedicated upscaling method that preserves the original image content without regeneration.
image-colorization
Image colorization model from Topaz Labs
dreamactor-m2.0
Animate any character, humans, cartoons, animals, even non-humans, from a single image + driving video
imagen-3
Google's highest quality text-to-image model, capable of generating images with detail, rich lighting and beauty
imagen-4
Google's Imagen 4 flagship model
firered-image-edit
FireRed-Image-Edit is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.
image-3.2
Commercial-ready, trained entirely on licensed data, text-to-image model. With only 4B parameters provides exceptional aesthetics and text rendering. Evaluated to be on par to other leading models in the market
fibo
SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.
fabric-1.0
VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video
p-image-edit-lora
Use trained LoRAs from the https://replicate.com/prunaai/p-image-edit-trainer. Find or contribute LoRAs here: https://huggingface.co/collections/PrunaAI/p-image-edit-loras.
p-image-trainer
Fast LoRA trainer for p-image, a super fast text-to-image model developed by Pruna AI. Use LoRAs here: https://replicate.com/prunaai/p-image-lora. Find or contribute LoRAs here: https://huggingface.co/collections/PrunaAI/p-image
