Aura Description
Basic Information
- Model Name: Aura
- Developer/Creator: Deepgram
- Release Date: June 2023
- Version: 1.0
- Model Type: Text-to-speech (TTS)
Versions
#g1_aura-asteria-en
#g1_aura-hera-en
#g1_aura-luna-en
#g1_aura-stella-en
#g1_aura-athena-en
#g1_aura-zeus-en
#g1_aura-orion-en
#g1_aura-arcas-en
#g1_aura-perseus-en
#g1_aura-angus-en
#g1_aura-orpheus-en
#g1_aura-helios-en
Overview
Deepgram Aura is the first text-to-speech (TTS) AI model designed for real-time, conversational AI agents and applications. It delivers human-like voice quality with unparalleled speed and efficiency, making it a game-changer for building responsive, high-throughput voice AI experiences.
Key Features
- Dozen natural, human-like voices with lower latency than any comparable voice AI alternative
- Optimized for responsive, conversational AI agents and applications
- Seamless integration with Deepgram's industry-leading Nova speech-to-text API
Intended Use
Deepgram Aura is primarily designed for building responsive, conversational AI agents and applications. It is particularly well-suited for use cases that require high-throughput voice interactions, such as customer service, virtual assistants, and interactive voice response (IVR) systems.
Language Support
Aura supports multiple languages and can handle diverse accents and dialects with ease.
Technical Details
Architecture
Aura's architecture is optimized for speed and efficiency, making it the fastest high-quality TTS option available. The model is built on Deepgram's expertise in processing and modeling speech audio, especially for streaming use cases with real-time STT models.
Training Data
Aura has been trained on millions of hours of high-quality audio data, enabling it to deliver natural-sounding voices across various languages and domains.
Data Source and Size
Deepgram has curated a diverse dataset of high-quality audio recordings, ensuring that Aura can handle a wide range of use cases. The model's knowledge cutoff is June 2023, the date of its initial release.
Diversity and Bias
Deepgram has taken steps to ensure that Aura is trained on diverse data, minimizing potential biases and enabling it to perform well across different demographics and use cases.
Performance Metrics
Aura's performance is unrivaled, with lower latency and higher voice quality than any comparable TTS model. The model has been successfully deployed in production by several Deepgram customers, demonstrating its real-world effectiveness.
Comparison to Other Models
Aura outperforms other popular TTS models in terms of voice quality, responsiveness, and cost-efficiency. Aura text-to-speech AI delivers natural-sounding, human-like voices with high accuracy and minimal errors. Aura is the fastest high-quality TTS option available, with low latency and high throughput. The model is also designed to handle diverse inputs and can adapt to different accents, dialects, and use cases with ease.
Usage
API Usage Example
const fs = require('fs');
const path = require('path');
const axios = require('axios').default;
const api = new axios.create({
baseURL: 'https://api.aimlapi.com/v1',
headers: { Authorization: 'Bearer <YOUR_API_KEY>' },
});
const main = async () => {
const response = await api.post(
'/tts',
{
model: '#g1_aura-asteria-en',
text: 'Hi! What are you doing today?',
},
{ responseType: 'stream' },
);
const dist = path.resolve(__dirname, './audio.wav');
const writeStream = fs.createWriteStream(dist);
response.data.pipe(writeStream);
writeStream.on('close', () => console.log('Audio saved to:', dist));
};
main();
Ethical Guidelines
Deepgram is committed to responsible AI development and has incorporated ethical considerations into Aura's design and deployment. The company continues to expand Aura's capabilities, with plans to add more lifelike voices, additional languages, and new features in the future.
License Type
Deepgram Aura is licensed for commercial and non-commercial use, with pricing based on usage.