๐ All Models
63 models ยท Page 1 of 2
Google Veo 3.0 Fast + Audio
Google Veo 3.0 + Audio
voxtral-mini-2507
A mini audio understanding model released in July 2025
voxtral-small-2507
A small audio understanding model released in July 2025
gpt-audio-mini-2025-12-15
voxtral-small-2507
A small audio understanding model released in July 2025
gpt-audio-1.5
voxtral-mini-2507
A mini audio understanding model released in July 2025
gpt-audio-mini-2025-10-06
wan2.6-i2v-flash
Image-to-video generation with optional audio, multi-shot narrative support, and faster inference
gpt-4o-mini-audio-preview-2024-12-17
speech-2.8-hd
Minimax Speech 2.8 HD focuses on high-fidelity audio generation with features like studio-grade quality, flexible emotion control, multilingual support, and voice cloning capabilities
ace-step-1.5
Music generation
gpt-audio-2025-08-28
sam-audio-large
SAM-Audio is a foundation model for isolating any sound in audio using text
sam-audio-base
A foundation model for isolating any sound in audio using text, visual, or temporal prompts
gpt-4o-audio-preview-2024-12-17
kling-v3-video
Kling Video 3.0: Generate cinematic videos up to 15 seconds with multi-shot control, native audio, and improved consistency
gpt-4o-audio-preview-2025-06-03
gpt-4o-mini-audio-preview
kling-v3-omni-video
Kling Video 3.0 Omni: Unified multimodal video generation with reference images, video editing, native audio, and multi-shot control
heart_mula
HeartMuLa: A Family of Open Sourced Music Foundation Models
ace-step-1.5
Ace Step 1.5 open source music generation model
ltx-2.3-pro
High-fidelity video generation with portrait support, audio-to-video, retake, and extend. Text, image, and audio-driven creation up to 4K at 50 FPS.
ltx-2.3-fast
Lightning-fast video generation with portrait support, camera controls, and synchronized audio. Up to 20 seconds at 1080p, 4K at 50 FPS.
ultimate_rvc
An extension of AiCoverGen, which provides several new features and improvements, enabling users to generate audio-related content using RVC with ease. Ideal for people who want to incorporate singing functionality into their AI assistant/chatbot/vtuber,
seedance-2.0
ByteDance's multimodal video generation model with native audio, multimodal reference inputs, and intelligent duration control.
lofi
Lo-fi hip-hop music generation with ACE-Step 1.5 + LoRA
music-2.6
Generate full-length songs or instrumentals from a text prompt, with optional auto-generated lyrics
p-video
Fast video generation with built-in draft mode for rapid creative iteration. Text-to-video, image-to-video, and audio-to-video in a single endpoint.
q3-turbo
Fast video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.
q3-pro
High-fidelity video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.
music-cover
Reimagine any song in a different style โ change voice, instruments, genre, and arrangement while keeping the original melody
veo-3.1-lite
Google's cost-efficient video generation model with native audio, optimized for high-volume applications
music-2.5
Generate full-length songs with vocals, lyrics, and rich instrumentation from a text prompt
seedance-2.0-fast
A faster variant of Seedance 2.0 for quicker video generation with multimodal inputs and native audio.
