AI Model Catalogue

kling-v2.5-turbo-pro

kwaivgi

Kling 2.5 Turbo Pro: Unlock pro-level text-to-video and image-to-video creation with smooth motion, cinematic depth, and remarkable prompt adherence.

sam3-video

lucataco

A unified foundation model for prompt-based segmentation in images and videos

p-video

prunaai

Fast video generation with built-in draft mode for rapid creative iteration. Text-to-video, image-to-video, and audio-to-video in a single endpoint.

visionimageaudio

⚡76msp50

wan-2.7-image-pro

wan-video

Generate and edit high-quality images with Alibaba's Wan 2.7 Pro with 4K output, thinking mode, text-to-image, multi-image editing, and image set generation

visionimagereasoning

seedream-4.5

bytedance

Seedream 4.5: Upgraded Bytedance image model with stronger spatial understanding and world knowledge

metric3dv2

visionaix

Metric3D v2 (TPAMI 2024): Monocular metric depth and surface normals from a single image. Predicts real-world depth in meters. Works indoor and outdoor.

flux-fill-pro

black-forest-labs

Professional inpainting and outpainting model with state-of-the-art performance. Edit or extend images with natural, seamless results.

seedance-2.0

bytedance

ByteDance's multimodal video generation model with native audio, multimodal reference inputs, and intelligent duration control.

wan-2.7-r2v

wan-video

Generate videos from reference images or clips while preserving subject identity using Alibaba's Wan 2.7 reference-to-video model

visionimagefree

reframe-image

luma

Change the aspect ratio of any photo using AI (not cropping)

p-image-edit

prunaai

A sub 1 second 0.01$ multi-image editing model built for production use cases. For image generation, check out p-image here: https://replicate.com/prunaai/p-image

visionimagefree

lucy-edit-2

decart

Edit and transform videos with text prompts and reference images. Style transfers, object replacement, character transformation, and more.

grok-imagine-r2v

xai

Generate videos guided by reference images using xAI's Grok Imagine Video model

visionimagefree

flux-2-flex

black-forest-labs

Max-quality image generation and editing with support for ten reference images

p-image-upscale

prunaai

Fastest image upscaler in the world (<1s) supporting outputs up to 128 MP.

visioninstructopen-source

microsoft/Phi-3.5-vision-instruct

microsoft

microsoft/Phi-3.5-vision-instruct is a image text to text model on Hugging Face with ~1,482,472 monthly downloads. Open access.

Output$0.0000/1M

Stable Diffusion 3.5 Large

Stability AI

Stable Diffusion 3.5 Large is Stability AI's most capable text-to-image model, delivering photorealistic and creative imagery with excellent prompt adherence and detail. Features multimodal diffusion transformer architecture.

visionmultimodallong-context

Amazon Nova Pro

amazon

Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost across a wide range of tasks. Supports text, image, and video inputs.

Input$0.8000/1M

Output$3.2000/1M

📏300kcontext

Amazon Nova Lite

amazon

Amazon Nova Lite is a very low-cost multimodal model that can process image, video, and text inputs. Fast and accurate for a wide range of tasks requiring visual and language understanding.

visionmultimodalcheap

Input$0.0600/1M

Output$0.2400/1M

📏300kcontext

OpenAI: GPT-4

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning...

OpenAI: GPT-4 Turbo (older v1106)

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to April 2023.

textvisionlong-context

Input$10.0000/1M

Output$30.0000/1M

📏128kcontext

Auto Router

openrouter

Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used,...

Anthropic: Claude 3 Haiku

anthropic

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal

OpenAI: GPT-4 Turbo

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.

⭐Top Rated

OpenAI: GPT-4o

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...