modelstop.top
Home/All Models

AI Model Catalogue

Browse 267 models across providers, modalities, and use cases.

๐Ÿ‘๏ธ Vision & Multimodal

267 models ยท Page 3 of 8

recraft-crisp-upscale

recraft-ai

Designed to make images sharper and cleaner, Crisp Upscale increases overall quality, making visuals suitable for web use or print-ready materials.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

flux-lora-ingres-painting

vestigia

FLUX LoRA trained on paintings by J.-A.-D. Ingres. Porcelain-smooth surfaces, exquisitely elongated forms, precise linear contours, neoclassical mastery.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

recraft-remove-background

recraft-ai

Automated background removal for images. Tuned for AI-generated content, product photos, portraits, and design workflows

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

recraft-vectorize

recraft-ai

Convert raster images to high-quality SVG format with precision and clean vector paths, perfect for logos, icons, and scalable graphics.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

sharp-ml

kfarr

Apple's SHARP model โ€” single image to 3D Gaussian splats

visionfree
ctxFree in
Explore specs and pricingView details โ†’

flux-lora-first-empire-paintings

vestigia

FLUX LoRA trained on French First Empire paintings. Napoleonic state portraiture โ€” generals in gleaming uniforms, imperial ceremonies, neoclassical grandeur.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

floorplan-recognition

ton731

Segment floorplan images into walls, doors, windows, and kitchen zones using a deep learning model, then extract structured contours and center lines as JSON for downstream applications.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

flux-lora-nattier-painting

vestigia

FLUX LoRA trained on paintings by Jean-Marc Nattier. Ladies of the court depicted as goddesses, flowing blue drapery, soft pastel tones, Rococo elegance.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

kling-v3-motion-control

kwaivgi

Kling 3.0 motion control: transfer motion from a reference video to any character image with improved consistency and quality.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

wan-2.7-r2v

wan-video

Generate videos from reference images or clips while preserving subject identity using Alibaba's Wan 2.7 reference-to-video model

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

ernie-image

prunaai

ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

siglip-large-patch16-384

devarsh-mavani-19

Get embeddings for image using siglip-large-patch16-384

visionfree
ctxFree in
Explore specs and pricingView details โ†’

irwin-image-lora

t-irwin-neiu

visionfree
ctxFree in
Explore specs and pricingView details โ†’

wan-2.7-image-pro

wan-video

Generate and edit high-quality images with Alibaba's Wan 2.7 Pro with 4K output, thinking mode, text-to-image, multi-image editing, and image set generation

visionimagereasoning
ctxFree in
Explore specs and pricingView details โ†’

p-video

prunaai

Fast video generation with built-in draft mode for rapid creative iteration. Text-to-video, image-to-video, and audio-to-video in a single endpoint.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

wan-2.7-i2v

wan-video

Generate videos from images, with support for first-and-last-frame control, clip continuation, and audio synchronization using Alibaba's Wan 2.7 model

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

firered-image-edit-1.1

prunaai

FireRed-Image-Edit 1.1 is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

ernie-image-turbo

prunaai

ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

imagen-4-ultra

google

Use this ultra version of Imagen 4 when quality matters more than speed and cost

visionfree
ctxFree in
Explore specs and pricingView details โ†’

q3-turbo

vidu

Fast video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

stems-separator

triadmusic

Image to separate stems from a song, using demucs and spleeter

visionfree
ctxFree in
Explore specs and pricingView details โ†’

seedream-4.5

bytedance

Seedream 4.5: Upgraded Bytedance image model with stronger spatial understanding and world knowledge

visionfree
ctxFree in
Explore specs and pricingView details โ†’

reframe-image

luma

Change the aspect ratio of any photo using AI (not cropping)

visionfree
ctxFree in
Explore specs and pricingView details โ†’

veo-3.1

google

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

visionaudiofree
ctxFree in
Explore specs and pricingView details โ†’

lucy-edit-2

decart

Edit and transform videos with text prompts and reference images. Style transfers, object replacement, character transformation, and more.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

p-image-upscale

prunaai

Fast image upscaler in the world (<1s) supporting outputs up to 8 MP. Upscales images to 4 MP in under one second.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

seedvr2

papina

๐Ÿ”ฅ SeedVR2: one-step video & image restoration with 7B and Adjustable Resolution

visionfree
ctxFree in
Explore specs and pricingView details โ†’

q3-pro

vidu

High-fidelity video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

depth-anything-v3-metric-pano

vufinder

Monocular metric depth estimation for panoramic images

visionfree
ctxFree in
Explore specs and pricingView details โ†’

sam3-video

lucataco

A unified foundation model for prompt-based segmentation in images and videos

visionfree
ctxFree in
Explore specs and pricingView details โ†’

lyria-3-pro

google

Generate full-length songs up to 3 minutes from text prompts or images with Lyria 3 Pro, Google's most capable music generation model

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

grok-imagine-r2v

xai

Generate videos guided by reference images using xAI's Grok Imagine Video model

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’

kling-v2.5-turbo-pro

kwaivgi

Kling 2.5 Turbo Pro: Unlock pro-level text-to-video and image-to-video creation with smooth motion, cinematic depth, and remarkable prompt adherence.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

kling-v2.6-motion-control

kwaivgi

Enables precise control of character actions and expressions from a reference image.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

flux-fill-pro

black-forest-labs

Professional inpainting and outpainting model with state-of-the-art performance. Edit or extend images with natural, seamless results.

visionfree
ctxFree in
Explore specs and pricingView details โ†’

flux-2-flex

black-forest-labs

Max-quality image generation and editing with support for ten reference images

visionimagefree
ctxFree in
Explore specs and pricingView details โ†’