๐ All Models
125 models ยท Page 2 of 4
wav2vec2-large-xlsr-53-greek
Open-source wav2vec2-large-xlsr-53-greek model from jonatasgrosman โ available for download and self-hosting on Hugging Face.
reverb-diarization-v1
Open-source reverb-diarization-v1 model from revai โ available for download and self-hosting on Hugging Face.
parakeet-tdt-0.6b-v3
Open-source parakeet-tdt-0.6b-v3 model from mlx-community โ available for download and self-hosting on Hugging Face.
parakeet-tdt-0.6b-v3-coreml
Open-source parakeet-tdt-0.6b-v3-coreml model from fluidinference โ available for download and self-hosting on Hugging Face.
wav2vec2-large-xlsr-53-telugu
Open-source wav2vec2-large-xlsr-53-telugu model from anuragshas โ available for download and self-hosting on Hugging Face.
Qwen3-ASR-0.6B
Open-source Qwen3-ASR-0.6B model from qwen โ available for download and self-hosting on Hugging Face.
faster-whisper-small
Open-source faster-whisper-small model from systran โ available for download and self-hosting on Hugging Face.
parakeet-tdt-0.6b-v3
Open-source parakeet-tdt-0.6b-v3 model from nvidia โ available for download and self-hosting on Hugging Face.
wav2vec2-large-xlsr-53-arabic
Open-source wav2vec2-large-xlsr-53-arabic model from jonatasgrosman โ available for download and self-hosting on Hugging Face.
whisper-tiny
Open-source whisper-tiny model from openai โ available for download and self-hosting on Hugging Face.
whisper-base
Open-source whisper-base model from openai โ available for download and self-hosting on Hugging Face.
whisper-large-v3-turbo
Open-source whisper-large-v3-turbo model from openai โ available for download and self-hosting on Hugging Face.
wav2vec2-large-xlsr-53-portuguese
Open-source wav2vec2-large-xlsr-53-portuguese model from jonatasgrosman โ available for download and self-hosting on Hugging Face.
wav2vec2-large-xlsr-53-russian
Open-source wav2vec2-large-xlsr-53-russian model from jonatasgrosman โ available for download and self-hosting on Hugging Face.
whisperkit-coreml
Open-source whisperkit-coreml model from argmaxinc โ available for download and self-hosting on Hugging Face.
speaker-diarization-community-1
Open-source speaker-diarization-community-1 model from pyannote โ available for download and self-hosting on Hugging Face.
wav2vec2-large-xlsr-korean
Open-source wav2vec2-large-xlsr-korean model from kresnik โ available for download and self-hosting on Hugging Face.
mms-1b-all
Open-source mms-1b-all model from facebook โ available for download and self-hosting on Hugging Face.
mms-300m-1130-forced-aligner
Open-source mms-300m-1130-forced-aligner model from mahmoudashraf โ available for download and self-hosting on Hugging Face.
wav2vec2-large-xlsr-53-japanese
Open-source wav2vec2-large-xlsr-53-japanese model from jonatasgrosman โ available for download and self-hosting on Hugging Face.
whisper-small
Open-source whisper-small model from openai โ available for download and self-hosting on Hugging Face.
speaker-diarization-3.1
Open-source speaker-diarization-3.1 model from pyannote โ available for download and self-hosting on Hugging Face.
whisper-large-v3
Open-source whisper-large-v3 model from openai โ available for download and self-hosting on Hugging Face.
wav2vec2-large-xlsr-53-chinese-zh-cn
Open-source wav2vec2-large-xlsr-53-chinese-zh-cn model from jonatasgrosman โ available for download and self-hosting on Hugging Face.
Qwen3-ASR-1.7B
Open-source Qwen3-ASR-1.7B model from qwen โ available for download and self-hosting on Hugging Face.
wav2vec2-large-xlsr-53-polish
Open-source wav2vec2-large-xlsr-53-polish model from jonatasgrosman โ available for download and self-hosting on Hugging Face.
Google Veo 3.0 + Audio
Google Veo 3.0 Fast + Audio
voxtral-mini-2507
A mini audio understanding model released in July 2025
gpt-audio-mini-2025-12-15
voxtral-small-2507
A small audio understanding model released in July 2025
voxtral-mini-2507
A mini audio understanding model released in July 2025
gpt-audio-1.5
voxtral-small-2507
A small audio understanding model released in July 2025
kling-v3-video
Kling Video 3.0: Generate cinematic videos up to 15 seconds with multi-shot control, native audio, and improved consistency
