Whisper is an automatic speech recognition (ASR) model released by OpenAI as open-source software in 2022. It was trained on 680,000 hours of multilingual audio data, which gives it unusually broad language coverage and strong robustness to accents, background noise, and technical terminology.
Unlike cloud-based transcription services, Whisper runs entirely on your local machine. This makes it suitable for transcribing sensitive audio — medical notes, legal recordings, private meetings — without sending data to a third-party server. The trade-off is that it requires a reasonably capable machine, and the larger model variants need a GPU for practical speed.
Developers use Whisper to add transcription to their own applications — meeting recorders, podcast tools, voice note apps, and accessibility features. The open-source license means there are no per-request fees for self-hosted deployments.
Researchers and journalists use it to transcribe interview recordings. The accuracy on technical vocabulary is notably better than consumer transcription tools, which often mangle domain-specific terms.
Content creators use it to generate subtitles and captions for videos. The timestamp output integrates well with video editing workflows, and the accuracy reduces the amount of manual correction needed.
Whisper is installed via pip and requires Python 3.8+. The small and medium models run acceptably on CPU for shorter audio files. The large model benefits significantly from a CUDA-capable GPU. Several third-party GUI wrappers exist for users who prefer not to use the command line.