๐๏ธ Audio & Speech
20 models ยท Page 1 of 1
whisper-large-v3-turbo
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.
whisper-tiny-en
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.
whisper
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
whisper-medium
Open-source whisper-medium model from openai โ available for download and self-hosting on Hugging Face.
whisper-tiny
Open-source whisper-tiny model from openai โ available for download and self-hosting on Hugging Face.
whisper-large-v3
Open-source whisper-large-v3 model from openai โ available for download and self-hosting on Hugging Face.
whisper-base
Open-source whisper-base model from openai โ available for download and self-hosting on Hugging Face.
whisper-large-v3-turbo
Open-source whisper-large-v3-turbo model from openai โ available for download and self-hosting on Hugging Face.
whisper-small
Open-source whisper-small model from openai โ available for download and self-hosting on Hugging Face.
gpt-audio-1.5
gpt-audio-mini-2025-12-15
gpt-audio-mini-2025-10-06
gpt-4o-mini-audio-preview-2024-12-17
gpt-4o-audio-preview-2024-12-17
gpt-audio-2025-08-28
gpt-4o-mini-audio-preview
gpt-4o-audio-preview-2025-06-03
OpenAI: GPT-4o Audio
The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs...
OpenAI: GPT Audio Mini
A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million...
OpenAI: GPT Audio
The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced...
