modelstop.top
Home/All Models

AI Model Catalogue

Browse 103 models across providers, modalities, and use cases.

๐ŸŽจ Image Generation

103 models ยท Page 3 of 3

irwin-image-lora

t-irwin-neiu

visionfree
ctxFree in
Explore specs and pricingView details โ†’

p-video

prunaai

Fast video generation with built-in draft mode for rapid creative iteration. Text-to-video, image-to-video, and audio-to-video in a single endpoint.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

firered-image-edit-1.1

prunaai

FireRed-Image-Edit 1.1 is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

ernie-image-turbo

prunaai

ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

imagen-4-ultra

google

Use this ultra version of Imagen 4 when quality matters more than speed and cost

visionfree
ctxFree in
Explore specs and pricingView details โ†’

lyria-3-pro

google

Generate full-length songs up to 3 minutes from text prompts or images with Lyria 3 Pro, Google's most capable music generation model

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

lucy-edit-2

decart

Edit and transform videos with text prompts and reference images. Style transfers, object replacement, character transformation, and more.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

wan-2.7-image-pro

wan-video

Generate and edit high-quality images with Alibaba's Wan 2.7 Pro with 4K output, thinking mode, text-to-image, multi-image editing, and image set generation

visionimagereasoning
ctxFree in
Explore specs and pricingView details โ†’

q3-pro

vidu

High-fidelity video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

p-image-upscale

prunaai

Fast image upscaler in the world (<1s) supporting outputs up to 8 MP. Upscales images to 4 MP in under one second.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

geocalib

visionaix

GeoCalib (ECCV 2024): Single-image camera calibration. Estimates focal length, FoV, distortion, roll and pitch from one image using a deep net + Levenberg-Marquardt optimizer. Works on both outdoor and indoor scenes.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

reframe-image

luma

Change the aspect ratio of any photo using AI (not cropping)

visionfree
ctxFree in
Explore specs and pricingView details โ†’

wan-2.7-image

wan-video

Generate and edit images with Alibaba's Wan 2.7

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

metric3dv2

visionaix

Metric3D v2 (TPAMI 2024): Monocular metric depth and surface normals from a single image. Predicts real-world depth in meters. Works indoor and outdoor.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

kling-v2.6-motion-control

kwaivgi

Enables precise control of character actions and expressions from a reference image.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

android-dream-v4

interfaceconjurer

A custom Flux LoRA model trained on painterly illustrated poster art inspired by Blade Runner 2049. The style features atmospheric cyberpunk cityscapes with dramatic scale โ€” tiny silhouetted figures dwarfed by massive holographic projections and towering

visionfree
ctxFree in
Explore specs and pricingView details โ†’

p-image-edit

prunaai

A sub 1 second 0.01$ multi-image editing model built for production use cases. For image generation, check out p-image here: https://replicate.com/prunaai/p-image

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

wan-2.7-i2v

wan-video

Generate videos from images, with support for first-and-last-frame control, clip continuation, and audio synchronization using Alibaba's Wan 2.7 model

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

flux-2-flex

black-forest-labs

Max-quality image generation and editing with support for ten reference images

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

wan-2.7-r2v

wan-video

Generate videos from reference images or clips while preserving subject identity using Alibaba's Wan 2.7 reference-to-video model

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

stems-separator

triadmusic

Image to separate stems from a song, using demucs and spleeter

visionfree
ctxFree in
Explore specs and pricingView details โ†’

sam3-video

lucataco

A unified foundation model for prompt-based segmentation in images and videos

visionfree
ctxFree in
Explore specs and pricingView details โ†’

veo-3.1

google

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

visionaudiofree
ctxFree in
Explore specs and pricingView details โ†’

siglip-large-patch16-384

devarsh-mavani-19

Get embeddings for image using siglip-large-patch16-384

visionfree
ctxFree in
Explore specs and pricingView details โ†’

flux-2-pro

black-forest-labs

High-quality image generation and editing with support for eight reference images

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

grok-imagine-r2v

xai

Generate videos guided by reference images using xAI's Grok Imagine Video model

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

seedvr2

papina

๐Ÿ”ฅ SeedVR2: one-step video & image restoration with 7B and Adjustable Resolution

visionfree
ctxFree in
Explore specs and pricingView details โ†’

product-photo-studio

i-tokyo

Generate professional e-commerce product photos from a single image. Automatically removes background, creates realistic studio scenes, and adds natural shadows.

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

Stable Diffusion 3.5 Large

Stability AI

Stable Diffusion 3.5 Large is Stability AI's most capable text-to-image model, delivering photorealistic and creative imagery with excellent prompt adherence and detail. Features multimodal diffusion transformer architecture.

visionopen-source
ctx$0.00/1M in
Explore specs and pricingView details โ†’

Amazon Nova Pro

amazon

Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost across a wide range of tasks. Supports text, image, and video inputs.

visionmultimodallong-context
300,000 ctx$0.80/1M in
Explore specs and pricingView details โ†’

Amazon Nova Lite

amazon

Amazon Nova Lite is a very low-cost multimodal model that can process image, video, and text inputs. Fast and accurate for a wide range of tasks requiring visual and language understanding.

visionmultimodalcheap
300,000 ctx$0.06/1M in
Explore specs and pricingView details โ†’