๐จ Image Generation
103 models ยท Page 3 of 3
irwin-image-lora
p-video
Fast video generation with built-in draft mode for rapid creative iteration. Text-to-video, image-to-video, and audio-to-video in a single endpoint.
firered-image-edit-1.1
FireRed-Image-Edit 1.1 is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.
ernie-image-turbo
ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu
imagen-4-ultra
Use this ultra version of Imagen 4 when quality matters more than speed and cost
lyria-3-pro
Generate full-length songs up to 3 minutes from text prompts or images with Lyria 3 Pro, Google's most capable music generation model
lucy-edit-2
Edit and transform videos with text prompts and reference images. Style transfers, object replacement, character transformation, and more.
wan-2.7-image-pro
Generate and edit high-quality images with Alibaba's Wan 2.7 Pro with 4K output, thinking mode, text-to-image, multi-image editing, and image set generation
q3-pro
High-fidelity video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.
p-image-upscale
Fast image upscaler in the world (<1s) supporting outputs up to 8 MP. Upscales images to 4 MP in under one second.
geocalib
GeoCalib (ECCV 2024): Single-image camera calibration. Estimates focal length, FoV, distortion, roll and pitch from one image using a deep net + Levenberg-Marquardt optimizer. Works on both outdoor and indoor scenes.
reframe-image
Change the aspect ratio of any photo using AI (not cropping)
wan-2.7-image
Generate and edit images with Alibaba's Wan 2.7
metric3dv2
Metric3D v2 (TPAMI 2024): Monocular metric depth and surface normals from a single image. Predicts real-world depth in meters. Works indoor and outdoor.
kling-v2.6-motion-control
Enables precise control of character actions and expressions from a reference image.
android-dream-v4
A custom Flux LoRA model trained on painterly illustrated poster art inspired by Blade Runner 2049. The style features atmospheric cyberpunk cityscapes with dramatic scale โ tiny silhouetted figures dwarfed by massive holographic projections and towering
p-image-edit
A sub 1 second 0.01$ multi-image editing model built for production use cases. For image generation, check out p-image here: https://replicate.com/prunaai/p-image
wan-2.7-i2v
Generate videos from images, with support for first-and-last-frame control, clip continuation, and audio synchronization using Alibaba's Wan 2.7 model
flux-2-flex
Max-quality image generation and editing with support for ten reference images
wan-2.7-r2v
Generate videos from reference images or clips while preserving subject identity using Alibaba's Wan 2.7 reference-to-video model
stems-separator
Image to separate stems from a song, using demucs and spleeter
sam3-video
A unified foundation model for prompt-based segmentation in images and videos
veo-3.1
New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support
siglip-large-patch16-384
Get embeddings for image using siglip-large-patch16-384
flux-2-pro
High-quality image generation and editing with support for eight reference images
grok-imagine-r2v
Generate videos guided by reference images using xAI's Grok Imagine Video model
seedvr2
๐ฅ SeedVR2: one-step video & image restoration with 7B and Adjustable Resolution
product-photo-studio
Generate professional e-commerce product photos from a single image. Automatically removes background, creates realistic studio scenes, and adds natural shadows.
Stable Diffusion 3.5 Large
Stable Diffusion 3.5 Large is Stability AI's most capable text-to-image model, delivering photorealistic and creative imagery with excellent prompt adherence and detail. Features multimodal diffusion transformer architecture.
Amazon Nova Pro
Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost across a wide range of tasks. Supports text, image, and video inputs.
Amazon Nova Lite
Amazon Nova Lite is a very low-cost multimodal model that can process image, video, and text inputs. Fast and accurate for a wide range of tasks requiring visual and language understanding.
