modelstop.top
Home/All Models

AI Model Catalogue

Browse 63 models across providers, modalities, and use cases.

๐ŸŒ All Models

63 models ยท Page 1 of 2

Google Veo 3.0 Fast + Audio

google

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

Google Veo 3.0 + Audio

google

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

voxtral-mini-2507

mistralai

A mini audio understanding model released in July 2025

textaudiofree
32,768 ctxFree in
Explore specs and pricingView details โ†’

voxtral-small-2507

mistralai

A small audio understanding model released in July 2025

textaudiofree
32,768 ctxFree in
Explore specs and pricingView details โ†’

gpt-audio-mini-2025-12-15

openai

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

voxtral-small-2507

mistralai

A small audio understanding model released in July 2025

textaudiofree
32,768 ctxFree in
Explore specs and pricingView details โ†’

gpt-audio-1.5

openai

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

voxtral-mini-2507

mistralai

A mini audio understanding model released in July 2025

textaudiofree
32,768 ctxFree in
Explore specs and pricingView details โ†’

gpt-audio-mini-2025-10-06

openai

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

wan2.6-i2v-flash

wan-video

Image-to-video generation with optional audio, multi-shot narrative support, and faster inference

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

gpt-4o-mini-audio-preview-2024-12-17

openai

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

speech-2.8-hd

minimax

Minimax Speech 2.8 HD focuses on high-fidelity audio generation with features like studio-grade quality, flexible emotion control, multilingual support, and voice cloning capabilities

audiomultilingualfree
ctxFree in
Explore specs and pricingView details โ†’

ace-step-1.5

visoar

Music generation

audiofree
ctxFree in
Explore specs and pricingView details โ†’

gpt-audio-2025-08-28

openai

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

sam-audio-large

geopti

SAM-Audio is a foundation model for isolating any sound in audio using text

audiofree
ctxFree in
Explore specs and pricingView details โ†’

sam-audio-base

geopti

A foundation model for isolating any sound in audio using text, visual, or temporal prompts

audiofree
ctxFree in
Explore specs and pricingView details โ†’

gpt-4o-audio-preview-2024-12-17

openai

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

kling-v3-video

kwaivgi

Kling Video 3.0: Generate cinematic videos up to 15 seconds with multi-shot control, native audio, and improved consistency

audiofree
ctxFree in
Explore specs and pricingView details โ†’

gpt-4o-audio-preview-2025-06-03

openai

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

gpt-4o-mini-audio-preview

openai

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

kling-v3-omni-video

kwaivgi

Kling Video 3.0 Omni: Unified multimodal video generation with reference images, video editing, native audio, and multi-shot control

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

heart_mula

meta-innovation

HeartMuLa: A Family of Open Sourced Music Foundation Models

audiofree
ctxFree in
Explore specs and pricingView details โ†’

ace-step-1.5

fishaudio

Ace Step 1.5 open source music generation model

audiofree
ctxFree in
Explore specs and pricingView details โ†’

ltx-2.3-pro

lightricks

High-fidelity video generation with portrait support, audio-to-video, retake, and extend. Text, image, and audio-driven creation up to 4K at 50 FPS.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

ltx-2.3-fast

lightricks

Lightning-fast video generation with portrait support, camera controls, and synchronized audio. Up to 20 seconds at 1080p, 4K at 50 FPS.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

ultimate_rvc

meta-innovation

An extension of AiCoverGen, which provides several new features and improvements, enabling users to generate audio-related content using RVC with ease. Ideal for people who want to incorporate singing functionality into their AI assistant/chatbot/vtuber,

audiofree
ctxFree in
Explore specs and pricingView details โ†’

seedance-2.0

bytedance

ByteDance's multimodal video generation model with native audio, multimodal reference inputs, and intelligent duration control.

visionaudiofree
ctxFree in
Explore specs and pricingView details โ†’

lofi

frow

Lo-fi hip-hop music generation with ACE-Step 1.5 + LoRA

audiofree
ctxFree in
Explore specs and pricingView details โ†’

music-2.6

minimax

Generate full-length songs or instrumentals from a text prompt, with optional auto-generated lyrics

audiofree
ctxFree in
Explore specs and pricingView details โ†’

p-video

prunaai

Fast video generation with built-in draft mode for rapid creative iteration. Text-to-video, image-to-video, and audio-to-video in a single endpoint.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

q3-turbo

vidu

Fast video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

q3-pro

vidu

High-fidelity video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.

visionimageaudio
ctxFree in
Explore specs and pricingView details โ†’

music-cover

minimax

Reimagine any song in a different style โ€” change voice, instruments, genre, and arrangement while keeping the original melody

audiofree
ctxFree in
Explore specs and pricingView details โ†’

veo-3.1-lite

google

Google's cost-efficient video generation model with native audio, optimized for high-volume applications

audiofree
ctxFree in
Explore specs and pricingView details โ†’

music-2.5

minimax

Generate full-length songs with vocals, lyrics, and rich instrumentation from a text prompt

audiofree
ctxFree in
Explore specs and pricingView details โ†’

seedance-2.0-fast

bytedance

A faster variant of Seedance 2.0 for quicker video generation with multimodal inputs and native audio.

visionaudiofree
ctxFree in
Explore specs and pricingView details โ†’