Voice Generation
Active

Aura 2

With high concurrency support and cost-efficient pricing, Aura 2 enables seamless, clear, and responsive voice AI interactions for industries like finance, healthcare, and customer support.
Try it now

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 200 models to integrate into your app.
AI Playground image
Ai models list in playground
Testimonials

Our Clients' Voices

Aura 2Techflow Logo - Techflow X Webflow Template

Aura 2

Aura 2 is Deepgram’s next-generation text-to-speech model designed specifically for enterprise applications requiring real-time, natural, and professional voice synthesis.

Model Overview

Aura-2 by Deepgram is a cutting-edge text-to-speech solution crafted for enterprise applications requiring live, natural voice synthesis. Tailored for enterprise applications, Aura-2 ensures exceptional clarity, accurate domain pronunciations, and flexible deployment in cloud or on-prem settings. It supports instant, context-sensitive speech creation for voice agents, interactive voice response, and AI conversations.

Technical Specifications

  • Latency: Consistent <200 ms TTFB, critical for live voice agents and IVRs.
  • Inference Tech: GPU-accelerated streaming-first architecture with quantization and pruning.
  • Scalability: Stateless distributed runtime enabling rapid scaling without bottlenecks.
  • Security: Designed with enterprise-grade deployment and data locality compliance.

Performance Benchmarks

  • Delivers sub-200ms Time-To-First-Byte (TTFB) latency, ensuring ultra-responsive, natural conversational flow.
  • Achieves a Real-Time Factor (RTF) of 0.111x, meaning it generates 1 second of audio in just over 100 milliseconds.
  • Supports thousands of concurrent sessions with consistent low latency and high-quality output.
  • Demonstrates minimal variance and low maximum latency even under high concurrency, critical for real-time virtual agents, IVRs, and assistants.
  • Outperforms many competitors by maintaining responsiveness below the 200ms conversational threshold consistently.
  • Designed with GPU-accelerated and optimized streaming-first Enterprise Runtime, enabling fast and efficient inference.
  • Can be deployed on cloud, VPC, or on-premises to reduce roundtrip delays and meet enterprise compliance.
  • Stateless distributed runtime architecture supports rapid scaling and efficient load balancing.
Aura-2 consistently outperforms competitors like ElevenLabs and OpenAI’s TTS solutions in latency-sensitive enterprise contexts.

API Pricing

  • $0.0315/1k characters

Key Features

  • Real-Time Performance: Sub-200ms Time-To-First-Byte (TTFB) latency ensuring natural, fluid conversations.
  • Fast Audio Generation: Real-Time Factor (RTF) of 0.111x, synthesizing 1 second of audio in just over 100 ms.
  • Domain-Specific Accuracy: Superior pronunciation for currency, dates, complex addresses, URLs, and technical terms.
  • Enterprise Scalability: Supports thousands of concurrent sessions without latency degradation.
  • Deployment Flexibility: Available via REST and WebSocket APIs; deployable on private clouds, VPCs, or on-premises.
  • Broad Voice Catalog: 40+ voices tailored for professional contexts and tones.
  • Multilingual Future-Proofing: Primarily English now, with plans for multi-language support.

Model Variants Overview

Deepgram Aura-2 includes a catalog of voices, each optimized for specific enterprise usage and voice characteristics:

aura-2-amalthea-en: Warm, approachable female voice for customer support.

aura-2-andromeda-en: Clear, authoritative male voice suited for financial domains.

aura-2-apollo-en: Energetic, youthful male voice for marketing and retail.

aura-2-arcas-en: Calm, neutral male voice ideal for healthcare communications.

aura-2-aries-en: Strong, confident male voice for technical support.

aura-2-asteria-en: Soft, caring female voice targeting education and training.

aura-2-athena-en: Professional, articulate female voice for legal and corporate sectors.

aura-2-atlas-en: Deep, steady male voice designed for logistics and transportation.

aura-2-aurora-en: Bright, clear female voice for media and broadcasting.

aura-2-callista-en: Friendly, engaging female voice for customer engagement.

aura-2-cora-en: A warm and friendly female voice, perfect for customer engagement and educational content.

aura-2-cordelia-en: Clear and professional female voice ideal for corporate training and support calls.

aura-2-delia-en: Calm, empathetic female voice designed for healthcare and wellness applications.

aura-2-draco-en: Assertive male voice well suited for technical support and financial services.

aura-2-electra-en: Energetic and dynamic female voice for marketing and retail promotions.

aura-2-harmonia-en: Balanced female voice offering clarity and a soothing tone for voice assistants.

aura-2-helena-en: Articulate female voice with a corporate tone, suitable for legal and business sectors.

aura-2-hera-en: Confident female voice ideal for education and training modules.

aura-2-hermes-en: Clear and authoritative male voice, fit for executive communications and announcements.

aura-2-hyperion-en: Deep, steady male voice crafted for logistics, transportation, and industrial use cases.

aura-2-iris-en: Bright and engaging female voice for media and broadcasting contexts.

aura-2-janus-en: Versatile male voice suitable for multi-purpose enterprise applications.

aura-2-juno-en: Friendly, approachable female voice for customer service and support channels.

aura-2-jupiter-en: Powerful, confident male voice tailored for financial and advisory services.

aura-2-luna-en: Soft and gentle female voice preferred in healthcare and personal coaching.

aura-2-mars-en: Strong and clear male voice designed for technical and operational environments.

aura-2-minerva-en: Intelligent, polished female voice, effective for training and educational use.

aura-2-neptune-en: Calm male voice well suited for meditation and wellness apps.

aura-2-odysseus-en: Narrative-style male voice designed for storytelling and guided tours.

aura-2-ophelia-en: Warm female voice with empathetic intonation for service industries.

aura-2-orion-en: Bold male voice for authoritative announcements and industrial contexts.

aura-2-orpheus-en: Smooth male voice with artistic tone, suited for media and creative applications.

aura-2-pandora-en: Engaging female voice crafted for marketing and promotions.

aura-2-phoebe-en: Clear, professional female voice ideal for e-learning and corporate communications.

aura-2-pluto-en: Deep male voice with a calm demeanor perfect for narration and voice-overs.

aura-2-saturn-en: Strong male voice tailored for customer support and financial sectors.

aura-2-selene-en: Soft female voice ideal for wellness, mindfulness, and personal care apps.

aura-2-thalia-en: Bright and dynamic female voice, great for retail and promotional content.

aura-2-theia-en: Professional female voice suitable for healthcare and legal domains.

aura-2-vesta-en: Clear female voice with steady pace designed for technical and customer service roles.

aura-2-zeus-en: Commanding, powerful male voice perfect for executive announcements and presentations.

Each voice is crafted with distinct tonal qualities and enterprise context appropriateness, ensuring businesses can select the perfect voice for their brand identity and use case.

Spanish voice variants

aura-2-celeste-es: Clear and friendly female Spanish voice for broad customer engagement.

aura-2-estrella-es: Warm and articulate female Spanish voice tailored for educational and media use.

aura-2-nestor-es: Assertive male Spanish voice designed for professional and corporate settings.

Use Cases

  • Real-time conversational voice AI agents
  • Interactive voice response (IVR) systems
  • Customer support automation
  • Transactional notifications (reminders, alerts)
  • Domain-specific voice assistants requiring accurate pronunciation
  • On-premises deployments for sensitive data environments

Comparison with Other Models

vs ElevenLabs Flash: Aura-2 is optimized for real-time enterprise use with sub-200ms latency, while ElevenLabs Flash offers very fast voice generation (~75ms start time) but with plan restrictions and fewer deployment options. Aura-2 supports flexible deployment including on-premises and VPC, whereas ElevenLabs is cloud-only. Aura-2 is also about 40% more cost-effective for large-scale business use.

vs OpenAI TTS: Aura-2 surpasses OpenAI’s TTS in latency, maintaining consistent sub-200ms response even in high concurrency scenarios critical for live agents and IVRs. OpenAI’s TTS focuses more on voice expressiveness suitable for offline or media-style applications, trading some real-time speed for richness. Aura-2’s architecture prioritizes throughput and scaling for demanding enterprise environments.

vs Cartesia Sonic: Aura-2 offers a more affordable per-character cost and lower latency than Cartesia Sonic while supporting distributed and on-premises deployments. Cartesia Sonic is primarily cloud-based with higher latency (~300ms), making Aura-2 better suited for use cases requiring rapid, natural conversations. Aura-2’s specialized runtime delivers lower infrastructure overhead at scale.

Try it now

The Best Growth Choice
for Enterprise

Get API Key