# openai/whisper

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/openai-whisper).**

102,828 stars · 12,544 forks · Python · MIT

## Links

- GitHub: https://github.com/openai/whisper
- awesome-repositories: https://awesome-repositories.com/repository/openai-whisper.md

## Description

This project is a speech recognition and translation engine that utilizes a sequence-to-sequence transformer architecture to convert audio into text. It is built upon a weakly supervised learning framework, which leverages large-scale, unlabelled audio-transcript data to create generalized speech representations capable of performing simultaneous transcription, language identification, and translation.

The system distinguishes itself through a unified multi-task modeling approach that shares token sequences across different objectives, allowing it to handle diverse languages and vocabularies without language-specific rules. By employing byte-level tokenization and sliding window audio segmentation, the engine maintains memory efficiency and temporal consistency when processing long-form audio or varied acoustic environments.

The toolkit provides both command-line and programmatic interfaces, enabling developers to integrate speech-to-text capabilities directly into custom software applications or automate high-volume batch processing of media libraries. It includes utilities for accessing multilingual and English-only speech corpora to support model validation and domain-specific performance tuning.

## Tags

### Artificial Intelligence & ML

- [Speech Recognition Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/speech-recognition-systems.md) — Transforms spoken audio into written text or translates across languages using a sequence-to-sequence transformer architecture. ([source](https://github.com/openai/whisper/blob/main/model-card.md))
- [Sequence Models](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/sequence-models.md) — Maps variable-length audio input sequences to text output sequences using deep learning and byte-level tokenization.
- [Multi-Task Learning Models](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/sequence-models/multi-task-learning-models.md) — Coordinates speech recognition, translation, and language identification simultaneously by sharing input-output sequences within a single model.
- [Transformer](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/transformer.md) — Employs stacked attention layers within a sequence-to-sequence design to process audio input and generate corresponding text.
- [Weakly Supervised Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/training-systems/weakly-supervised-learning.md) — Trains generalized speech representation models by leveraging massive volumes of weakly labeled audio-transcript pairs.
- [Automatic Speech Recognition](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/automatic-speech-recognition.md) — Leverages large-scale, robust models trained on diverse datasets to convert spoken audio recordings into accurate text. ([source](https://github.com/openai/whisper#readme))
- [Multilingual Speech Translation](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/multilingual-speech-translation.md) — Detects, transcribes, and translates foreign-language audio into English text through automated speech processing.
- [Speech Recognition APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/speech-recognition-apis.md) — Exposes programmatic interfaces for integrating high-performance speech-to-text capabilities directly into custom software applications. ([source](https://github.com/openai/whisper))
- [Speech Recognition Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/speech-recognition-libraries.md) — Simplifies the integration of robust speech-to-text functionality into applications to enable voice-driven features.
- [Speech Translation Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/speech-translation-systems.md) — Automates the identification, transcription, and translation of foreign-language audio into English text.

### Graphics & Multimedia

- [Automatic Speech Recognition Toolkits](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/audio-analysis-synthesis/automatic-speech-recognition-toolkits.md) — Bundles command-line and programmatic tools to incorporate high-accuracy speech transcription into automated media processing workflows.
- [Batch Media Processors](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/media-workflow-orchestration/batch-media-processors.md) — Streamlines high-volume audio transcription tasks through terminal-based commands for efficient batch processing of media files.

### Part of an Awesome List

- [Additional AI Tools](https://awesome-repositories.com/f/awesome-lists/ai/additional-ai-tools.md) — Robust speech recognition model for transcription and translation.
- [AI and Agents](https://awesome-repositories.com/f/awesome-lists/ai/ai-and-agents.md) — A general-purpose automatic speech recognition model.
- [AI & Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/ai-machine-learning.md) — General-purpose local speech recognition model.
- [AI Tools and Frameworks](https://awesome-repositories.com/f/awesome-lists/ai/ai-tools-and-frameworks.md) — Robust speech-to-text transcription and translation model.
- [Audio Generation and Processing](https://awesome-repositories.com/f/awesome-lists/ai/audio-generation-and-processing.md) — Robust large-scale speech recognition and transcription model.
- [Core Models](https://awesome-repositories.com/f/awesome-lists/ai/core-models.md) — The primary open-source speech recognition model from OpenAI.
- [Foundation Models](https://awesome-repositories.com/f/awesome-lists/ai/foundation-models.md) — Robust speech recognition model trained on large-scale audio data.
- [Generative Media Tools](https://awesome-repositories.com/f/awesome-lists/ai/generative-media-tools.md) — Robust speech recognition and transcription.
- [Speech Processing](https://awesome-repositories.com/f/awesome-lists/media/speech-processing.md) — Robust speech-to-text transcription model.
- [Speech Recognition](https://awesome-repositories.com/f/awesome-lists/media/speech-recognition.md) — Robust open-source model for speech-to-text transcription.
- [Business And Marketing Tools](https://awesome-repositories.com/f/awesome-lists/productivity/business-and-marketing-tools.md) — General-purpose speech recognition model.

### Development Tools & Productivity

- [CLI Tooling](https://awesome-repositories.com/f/development-tools-productivity/terminal-shell-cli/cli-tooling-frameworks/cli-tooling.md) — Enables the execution of complex speech recognition tasks directly from the terminal by selecting specific model sizes and input files. ([source](https://github.com/openai/whisper))
