modelstop.top
Home/All Models

AI Model Catalogue

Browse 63 models across providers, modalities, and use cases.

🌐 All Models

63 models · Page 2 of 2

wan-2.7-t2v

wan-video

Generate videos with audio from text prompts using Alibaba's Wan 2.7 model. 1080p, up to 15 seconds, with audio synchronization.

audiofree
ctxFree in
Explore specs and pricingView details →

wan-2.7-i2v

wan-video

Generate videos from images, with support for first-and-last-frame control, clip continuation, and audio synchronization using Alibaba's Wan 2.7 model

visionimageaudio
ctxFree in
Explore specs and pricingView details →

veo-3.1

google

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

visionaudiofree
ctxFree in
Explore specs and pricingView details →

veo-3.1-fast

google

New and improved version of Veo 3 Fast, with higher-fidelity video, context-aware audio and last frame support

audiofree
ctxFree in
Explore specs and pricingView details →

dotted-waveform-visualizer

lucataco

Create a dotted waveform video from an audio file

audiofree
ctxFree in
Explore specs and pricingView details →

zai-org/GLM-ASR-Nano-2512

zai-org

zai-org/GLM-ASR-Nano-2512 is a automatic speech recognition model on Hugging Face with ~160,973 monthly downloads. Open access.

audioopen-source
ctx$0.00/1M in
Explore specs and pricingView details →

Auto Router

openrouter

Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used,...

textvisionmultimodal
2,000,000 ctxFree in
Explore specs and pricingView details →

Google: Gemini 2.0 Flash

google

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It...

textvisionmultimodal
1,000,000 ctx$0.10/1M in
Explore specs and pricingView details →

Google: Gemini 2.0 Flash Lite

google

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5),...

textvisionmultimodal
1,048,576 ctx$0.07/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro Preview 05-06

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Google: Gemma 3n 4B (free)

google

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

textvisionaudio
8,192 ctx$0.02/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro Preview 06-05

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Pro

google

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

textvisionmultimodal
1,048,576 ctx$1.25/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash

google

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

textvisionmultimodal
1,048,576 ctx$0.30/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash Lite

google

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

textvisionimage
1,048,576 ctx$0.10/1M in
Explore specs and pricingView details →

OpenAI: GPT-4o Audio

openai

The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs...

textaudiolong-context
128,000 ctx$2.50/1M in
Explore specs and pricingView details →

Google: Gemini 2.5 Flash Lite Preview 09-2025

google

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

textvisionimage
1,048,576 ctx$0.10/1M in
Explore specs and pricingView details →

Mistral: Voxtral Small 24B 2507

mistralai

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio...

textaudiocheap
32,000 ctx$0.10/1M in
Explore specs and pricingView details →

Google: Gemini 3 Flash Preview

google

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...

textvisionmultimodal
1,048,576 ctx$0.50/1M in
Explore specs and pricingView details →

OpenAI: GPT Audio Mini

openai

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million...

textaudiocheap
128,000 ctx$0.60/1M in
Explore specs and pricingView details →

OpenAI: GPT Audio

openai

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced...

textaudiolong-context
128,000 ctx$2.50/1M in
Explore specs and pricingView details →

Google: Gemini 3.1 Pro Preview

google

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

textvisionmultimodal
1,048,576 ctx$2.00/1M in
Explore specs and pricingView details →

Google: Gemini 3.1 Pro Preview Custom Tools

google

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party...

textvisionmultimodal
1,048,576 ctx$2.00/1M in
Explore specs and pricingView details →

Google: Gemini 3.1 Flash Lite Preview

google

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

textvisionmultimodal
1,048,576 ctx$0.25/1M in
Explore specs and pricingView details →

Xiaomi: MiMo-V2-Omni

xiaomi

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

textvisionmultimodal
262,144 ctx$0.40/1M in
Explore specs and pricingView details →

Google: Lyria 3 Clip Preview

google

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...

textvisionimage
1,048,576 ctx$0.00/1M in
Explore specs and pricingView details →

Google: Lyria 3 Pro Preview

google

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...

textvisionimage
1,048,576 ctx$0.00/1M in
Explore specs and pricingView details →