gpt-4o-mini-transcribe - A lightweight speech-to-text model optimized for resource-constrained devices.
## Purpose of gpt-4o-mini-transcribe
The **gpt-4o-mini-transcribe** model is designed for **lightweight speech-to-text transcription and translation**, optimized for **resource-constrained devices** such as mobile and embedded systems. It balances performance with efficiency, making it suitable for real-time applications like voice assistants, live transcription, and multilingual translation.
## Supported audio formats
The model supports the following audio formats:
- **mp3**
- **mp4**
- **mpeg**
- **mpga**
- **m4a**
- **wav**
- **webm**
The maximum file size allowed is **25 MB**.
## Output Formats for Transcriptions
The model provides two output formats:
- **JSON** (structured data for developers)
- **Text** (plain transcription output).
## Real-time streaming support
Yes, **gpt-4o-mini-transcribe supports real-time streaming** for completed audio recordings. Developers can enable streaming using the `stream=True` parameter in API calls, receiving incremental transcription updates via `transcript.text.delta` events and a final `transcript.text.done` event.
## Handling unsupported languages
For languages not explicitly supported (via ISO 639-1/639-3 codes), users can **specify the output language** using prompts (e.g., "Output in English"). The model will then transcribe and/or translate the audio accordingly.
## Pricing of the model
As of March 2025, the model is priced at **$3 per million audio input tokens** (approximately **$0.003 per minute**), reflecting its cost-efficient design for scalable deployment.
## Integration method
Developers can use the **OpenAI API endpoint (`/audio/transcriptions`)** with the Python SDK. Example code:
### Citation sources:
- [gpt-4o-mini-transcribe](gpt-4o-mini-transcribe) - Official URL
Updated: 2025-04-01