๐๏ธ Vision & Multimodal
267 models ยท Page 3 of 8
recraft-crisp-upscale
Designed to make images sharper and cleaner, Crisp Upscale increases overall quality, making visuals suitable for web use or print-ready materials.
flux-lora-ingres-painting
FLUX LoRA trained on paintings by J.-A.-D. Ingres. Porcelain-smooth surfaces, exquisitely elongated forms, precise linear contours, neoclassical mastery.
recraft-remove-background
Automated background removal for images. Tuned for AI-generated content, product photos, portraits, and design workflows
recraft-vectorize
Convert raster images to high-quality SVG format with precision and clean vector paths, perfect for logos, icons, and scalable graphics.
sharp-ml
Apple's SHARP model โ single image to 3D Gaussian splats
flux-lora-first-empire-paintings
FLUX LoRA trained on French First Empire paintings. Napoleonic state portraiture โ generals in gleaming uniforms, imperial ceremonies, neoclassical grandeur.
floorplan-recognition
Segment floorplan images into walls, doors, windows, and kitchen zones using a deep learning model, then extract structured contours and center lines as JSON for downstream applications.
flux-lora-nattier-painting
FLUX LoRA trained on paintings by Jean-Marc Nattier. Ladies of the court depicted as goddesses, flowing blue drapery, soft pastel tones, Rococo elegance.
kling-v3-motion-control
Kling 3.0 motion control: transfer motion from a reference video to any character image with improved consistency and quality.
wan-2.7-r2v
Generate videos from reference images or clips while preserving subject identity using Alibaba's Wan 2.7 reference-to-video model
ernie-image
ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu
siglip-large-patch16-384
Get embeddings for image using siglip-large-patch16-384
irwin-image-lora
wan-2.7-image-pro
Generate and edit high-quality images with Alibaba's Wan 2.7 Pro with 4K output, thinking mode, text-to-image, multi-image editing, and image set generation
p-video
Fast video generation with built-in draft mode for rapid creative iteration. Text-to-video, image-to-video, and audio-to-video in a single endpoint.
wan-2.7-i2v
Generate videos from images, with support for first-and-last-frame control, clip continuation, and audio synchronization using Alibaba's Wan 2.7 model
firered-image-edit-1.1
FireRed-Image-Edit 1.1 is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.
ernie-image-turbo
ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu
imagen-4-ultra
Use this ultra version of Imagen 4 when quality matters more than speed and cost
q3-turbo
Fast video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.
stems-separator
Image to separate stems from a song, using demucs and spleeter
seedream-4.5
Seedream 4.5: Upgraded Bytedance image model with stronger spatial understanding and world knowledge
reframe-image
Change the aspect ratio of any photo using AI (not cropping)
veo-3.1
New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support
lucy-edit-2
Edit and transform videos with text prompts and reference images. Style transfers, object replacement, character transformation, and more.
p-image-upscale
Fast image upscaler in the world (<1s) supporting outputs up to 8 MP. Upscales images to 4 MP in under one second.
seedvr2
๐ฅ SeedVR2: one-step video & image restoration with 7B and Adjustable Resolution
q3-pro
High-fidelity video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.
depth-anything-v3-metric-pano
Monocular metric depth estimation for panoramic images
sam3-video
A unified foundation model for prompt-based segmentation in images and videos
lyria-3-pro
Generate full-length songs up to 3 minutes from text prompts or images with Lyria 3 Pro, Google's most capable music generation model
grok-imagine-r2v
Generate videos guided by reference images using xAI's Grok Imagine Video model
kling-v2.5-turbo-pro
Kling 2.5 Turbo Pro: Unlock pro-level text-to-video and image-to-video creation with smooth motion, cinematic depth, and remarkable prompt adherence.
kling-v2.6-motion-control
Enables precise control of character actions and expressions from a reference image.
flux-fill-pro
Professional inpainting and outpainting model with state-of-the-art performance. Edit or extend images with natural, seamless results.
flux-2-flex
Max-quality image generation and editing with support for ten reference images
