AI Model Catalogue

ernie-image

prunaai

ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu

visionimagefree

wan-2.7-image

wan-video

Generate and edit images with Alibaba's Wan 2.7

visionimagefree

q3-pro

vidu

High-fidelity video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.

visionimageaudio

visioninstructopen-source

microsoft/Phi-3.5-vision-instruct

microsoft

microsoft/Phi-3.5-vision-instruct is a image text to text model on Hugging Face with ~1,482,472 monthly downloads. Open access.

ctx$0.00/1M in

Stable Diffusion 3.5 Large

Stability AI

Stable Diffusion 3.5 Large is Stability AI's most capable text-to-image model, delivering photorealistic and creative imagery with excellent prompt adherence and detail. Features multimodal diffusion transformer architecture.

visionopen-source

ctx$0.00/1M in

Amazon Nova Pro

visionmultimodallong-context

Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost across a wide range of tasks. Supports text, image, and video inputs.

300,000 ctx$0.80/1M in

Amazon Nova Lite

Amazon Nova Lite is a very low-cost multimodal model that can process image, video, and text inputs. Fast and accurate for a wide range of tasks requiring visual and language understanding.

visionmultimodalcheap

300,000 ctx$0.06/1M in

OpenAI: GPT-4

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning...

textvisionreasoning

8,191 ctx$30.00/1M in

OpenAI: GPT-4 Turbo (older v1106)

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to April 2023.

textvisionlong-context

128,000 ctx$10.00/1M in

Auto Router

openrouter

Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used,...

2,000,000 ctxFree in

Anthropic: Claude 3 Haiku

anthropic

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal

200,000 ctx$0.25/1M in

OpenAI: GPT-4 Turbo

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.

128,000 ctx$10.00/1M in

OpenAI: GPT-4o

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...

128,000 ctx$2.50/1M in

OpenAI: GPT-4o (2024-05-13)

128,000 ctx$5.00/1M in

OpenAI: GPT-4o-mini

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...

128,000 ctx$0.15/1M in

OpenAI: GPT-4o-mini (2024-07-18)

128,000 ctx$0.15/1M in

OpenAI: GPT-4o (2024-08-06)

The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai.com/index/introducing-structured-outputs-in-the-api/). GPT-4o ("o" for "omni") is...

128,000 ctx$2.50/1M in

Meta: Llama 3.2 11B Vision Instruct

meta-llama

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

131,072 ctx$0.24/1M in

Anthropic: Claude 3.5 Haiku

anthropic

Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic...

200,000 ctx$0.80/1M in

Mistral: Pixtral Large 2411

mistralai

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is...

131,072 ctx$2.00/1M in

OpenAI: GPT-4o (2024-11-20)

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded...

128,000 ctx$2.50/1M in

Amazon: Nova Pro 1.0

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December...

300,000 ctx$0.80/1M in

Amazon: Nova Lite 1.0

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...

textvisionimage

300,000 ctx$0.06/1M in

OpenAI: o1

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason...

200,000 ctx$15.00/1M in

MiniMax: MiniMax-01

minimax

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context...

textvisionimage

1,000,192 ctx$0.20/1M in

Perplexity: Sonar

perplexity

Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features...

127,072 ctx$1.00/1M in

Qwen: Qwen2.5 VL 72B Instruct

qwen

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

32,000 ctx$0.80/1M in