modelstop.top
Home/All Models

AI Model Catalogue

Browse 125 models across providers, modalities, and use cases.

๐ŸŽ™๏ธ Audio & Speech

125 models ยท Page 1 of 4

aura-2-en

deepgram

Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

audiofree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

whisper-large-v3-turbo

openai

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.

audiofree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

whisper-tiny-en

openai

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.

audiofree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

aura-1

deepgram

Aura is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

audiofree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

nova-3

deepgram

Transcribe audio using Deepgramโ€™s speech-to-text model

audiofree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

aura-2-es

deepgram

Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

audiofree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

whisper

openai

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

audiomultilingualfree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

melotts

myshell-ai

MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai.

audiofree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

flux

deepgram

Flux is the first conversational speech recognition model built specifically for voice agents.

audioagentsfree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

smart-turn-v2

pipecat-ai

An open source, community-driven, native audio turn detection model in 2nd version

textaudiofree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

stable-audio-open-small

stabilityai

Open-source stable-audio-open-small model from stabilityai โ€” available for download and self-hosting on Hugging Face.

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

stable-audio-open-1.0

stabilityai

Open-source stable-audio-open-1.0 model from stabilityai โ€” available for download and self-hosting on Hugging Face.

textaudiofree
ctxFree in
Explore specs and pricingView details โ†’

wav2vec2-large-xlsr-open-brazilian-portuguese-v2

lgris

Open-source wav2vec2-large-xlsr-open-brazilian-portuguese-v2 model from lgris โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

Voxtral-Mini-4B-Realtime-2602

mistralai

Open-source Voxtral-Mini-4B-Realtime-2602 model from mistralai โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

filipino-wav2vec2-l-xls-r-300m-official

khalsuu

Open-source filipino-wav2vec2-l-xls-r-300m-official model from khalsuu โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

faster-whisper-tiny

systran

Open-source faster-whisper-tiny model from systran โ€” available for download and self-hosting on Hugging Face.

audiofree
Run locally
ctxFree in
Explore specs and pricingView details โ†’

parakeet-tdt-0.6b-v3

mlx-community

Open-source parakeet-tdt-0.6b-v3 model from mlx-community โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

whisper-tiny

openai

Open-source whisper-tiny model from openai โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

nb-wav2vec2-1b-nynorsk

nbailab

Open-source nb-wav2vec2-1b-nynorsk model from nbailab โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

wav2vec2-large-xlsr-53-telugu

anuragshas

Open-source wav2vec2-large-xlsr-53-telugu model from anuragshas โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

wav2vec2-large-xls-r-300m-Urdu

kingabzpro

Open-source wav2vec2-large-xls-r-300m-Urdu model from kingabzpro โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

parakeet-ctc-1.1b

nvidia

Open-source parakeet-ctc-1.1b model from nvidia โ€” available for download and self-hosting on Hugging Face.

audiofree
Run locally
ctxFree in
Explore specs and pricingView details โ†’

Qwen3-ASR-0.6B

qwen

Open-source Qwen3-ASR-0.6B model from qwen โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

wav2vec2-xls-r-300m-cv7-turkish

mpoyraz

Open-source wav2vec2-xls-r-300m-cv7-turkish model from mpoyraz โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

faster-whisper-base

systran

Open-source faster-whisper-base model from systran โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

parakeet-tdt-0.6b-v3

nvidia

Open-source parakeet-tdt-0.6b-v3 model from nvidia โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

parakeet-tdt-0.6b-v3-coreml

fluidinference

Open-source parakeet-tdt-0.6b-v3-coreml model from fluidinference โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

wav2vec2-large-xlsr-53-greek

jonatasgrosman

Open-source wav2vec2-large-xlsr-53-greek model from jonatasgrosman โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

whisper-medium

openai

Open-source whisper-medium model from openai โ€” available for download and self-hosting on Hugging Face.

audiofree
Run locally
ctxFree in
Explore specs and pricingView details โ†’

reverb-diarization-v1

revai

Open-source reverb-diarization-v1 model from revai โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

faster-whisper-small

systran

Open-source faster-whisper-small model from systran โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

wav2vec2-base-960h

facebook

Open-source wav2vec2-base-960h model from facebook โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

overlapped-speech-detection

pyannote

Open-source overlapped-speech-detection model from pyannote โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

distil-whisper-large-v3-ptbr

freds0

Open-source distil-whisper-large-v3-ptbr model from freds0 โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’

distil-large-v3

distil-whisper

Open-source distil-large-v3 model from distil-whisper โ€” available for download and self-hosting on Hugging Face.

audiofree
Run locally
ctxFree in
Explore specs and pricingView details โ†’

wav2vec2-large-xlsr-53-arabic

jonatasgrosman

Open-source wav2vec2-large-xlsr-53-arabic model from jonatasgrosman โ€” available for download and self-hosting on Hugging Face.

audiofree
ctxFree in
Explore specs and pricingView details โ†’