Lightweight GPT-4o with speech capabilities
Designed for quick, low-resource speech applications, GPT-4o Mini Audio enables fast, natural interactions in audio-based tools with support for both speech input and output. It is a cost-effective version that offers advanced audio capabilities at just 25% of the cost of the full GPT-4o Audio models, making it accessible for developers building voice-driven applications.
Derived from GPT-4o through model distillation, it retains the Transformer-based architecture optimized for audio tasks. The model includes advanced voice activity detection (VAD) layers for precise audio segmentation and processing.
The model was trained on a diverse dataset that includes:
The training data spans hundreds of hours of high-quality audio recordings combined with billions of text tokens to ensure robust multimodal performance.
October 2023, with no real-time web search capability but optimized for static datasets.
Achieves high-rate performance in:
Processes asynchronous audio tasks at an average latency of 420 milliseconds per second of input audio, making it suitable for near-real-time applications.
Handles diverse accents, dialects, and noisy environments effectively but may exhibit reduced accuracy in highly specialized jargon or low-resource languages.
The model is available on the AI/ML API platform as "gpt-4o-mini-audio".
Detailed API Documentation is available on the AI/ML API website, providing comprehensive guidelines for integration
OpenAI has established ethical considerations in the model's development, focusing on safety and bias mitigation. The model incorporates OpenAI’s bias mitigation framework but may reflect biases inherent in its training data sources, particularly in underrepresented languages or accents.
GPT-4o is available under commercial usage rights, allowing businesses to integrate the model into their applications.