modelstop.top
Home/All Models

AI Model Catalogue

Browse 273 models across providers, modalities, and use cases.

๐ŸŽจ Image Generation

273 models ยท Page 8 of 8

stems-separator

triadmusic

Image to separate stems from a song, using demucs and spleeter

visionfree
ctxFree in
Explore specs and pricingView details โ†’

q3-pro

vidu

High-fidelity video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

wan-2.7-image

wan-video

Generate and edit images with Alibaba's Wan 2.7

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

ffhqdat-4x-upscaler

supersambat

4x face image upscaler trained on FFHQ dataset using DAT (Dual Aggregation Transformer) architecture. Optimized for portrait and face photos.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

flux-2-max

black-forest-labs

The highest fidelity image model from Black Forest Labs

visionfree
ctxFree in
Explore specs and pricingView details โ†’

product-photo-studio

i-tokyo

Generate professional e-commerce product photos from a single image. Automatically removes background, creates realistic studio scenes, and adds natural shadows.

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

android-dream-v4

interfaceconjurer

A custom Flux LoRA model trained on painterly illustrated poster art inspired by Blade Runner 2049. The style features atmospheric cyberpunk cityscapes with dramatic scale โ€” tiny silhouetted figures dwarfed by massive holographic projections and towering

visionfree
ctxFree in
Explore specs and pricingView details โ†’

p-image-upscale

prunaai

Fastest image upscaler in the world (<1s) supporting outputs up to 128 MP.

visionfree
Run locally
ctxFree in
Explore specs and pricingView details โ†’

geocalib

visionaix

GeoCalib (ECCV 2024): Single-image camera calibration. Estimates focal length, FoV, distortion, roll and pitch from one image using a deep net + Levenberg-Marquardt optimizer. Works on both outdoor and indoor scenes.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

seedvr2

papina

๐Ÿ”ฅ SeedVR2: one-step video & image restoration with 7B and Adjustable Resolution

visionfree
ctxFree in
Explore specs and pricingView details โ†’

grok-imagine-r2v

xai

Generate videos guided by reference images using xAI's Grok Imagine Video model

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

flux-2-pro

black-forest-labs

High-quality image generation and editing with support for eight reference images

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

siglip-large-patch16-384

devarsh-mavani-19

Get embeddings for image using siglip-large-patch16-384

visionfree
ctxFree in
Explore specs and pricingView details โ†’

ernie-image

prunaai

ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu

visionimagefree
Run locally
ctxFree in
Explore specs and pricingView details โ†’

lucy-edit-2

decart

Edit and transform videos with text prompts and reference images. Style transfers, object replacement, character transformation, and more.

visionfree
Run locally
ctxFree in
Explore specs and pricingView details โ†’

p-image-edit

prunaai

A sub 1 second 0.01$ multi-image editing model built for production use cases. For image generation, check out p-image here: https://replicate.com/prunaai/p-image

visionimagefree
Run locally
ctxFree in
Explore specs and pricingView details โ†’

firered-image-edit-1.1

prunaai

FireRed-Image-Edit 1.1 is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

veo-3.1

google

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

visionaudiofree
ctxFree in
Explore specs and pricingView details โ†’

Stable Diffusion 3.5 Large

Stability AI

Stable Diffusion 3.5 Large is Stability AI's most capable text-to-image model, delivering photorealistic and creative imagery with excellent prompt adherence and detail. Features multimodal diffusion transformer architecture.

visionopen-source
Run locally
ctx$0.00/1M in
Explore specs and pricingView details โ†’

Amazon Nova Pro

amazon

Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost across a wide range of tasks. Supports text, image, and video inputs.

visionmultimodallong-context
300,000 ctx$0.80/1M in
Explore specs and pricingView details โ†’

Amazon Nova Lite

amazon

Amazon Nova Lite is a very low-cost multimodal model that can process image, video, and text inputs. Fast and accurate for a wide range of tasks requiring visual and language understanding.

visionmultimodalcheap
300,000 ctx$0.06/1M in
Explore specs and pricingView details โ†’