Dia 1.6B TTS

Ultra-Realistic AI Speech Dialogue Model

An open-source 1.6B parameter text-to-speech model by Nari Labs that generates human-like speech with natural intonation, rhythm, and emotion. Meet Dia 1.6B TTS.

Dia TTS Hero Animation

What is Dia 1.6B TTS?

Dia 1.6B TTS is a cutting-edge AI text-to-speech model designed for ultra-realistic dialogue synthesis. Developed by Nari Labs and released under the Apache 2.0 license, Dia 1.6B TTS offers natural and expressive speech output that rivals commercial solutions.

  • Speech synthesis with natural intonation, rhythm, and emotional expression using Dia 1.6B TTS
  • Optimized multi-speaker dialogue generation with Dia 1.6B TTS
  • 1.6B parameter model that runs on 10GB VRAM
  • Voice cloning capabilities through audio prompting

Dia 1.6B TTS Core Features

Dia 1.6B TTS Exceptional Speech Quality

Dia 1.6B TTS produces incredibly natural-sounding voices with human-like intonation, rhythm, and emotion. The advanced AI model creates speech that's nearly indistinguishable from human voices.

Dia 1.6B TTS: Multi-Speaker Support

Easily create multi-speaker conversations using simple tags like [S1] and [S2] to specify different voices in your text, maintaining consistent and natural dialogue with Dia 1.6B TTS.

Voice Cloning with Dia 1.6B TTS

Clone specific vocal characteristics using the audio prompting feature, enabling consistent voice identity across multiple generations for personalized speech output with Dia 1.6B TTS.

Dia 1.6B TTS: Open Source Model

Released under Apache 2.0 license, allowing free use for personal and commercial purposes. Complete model weights and code for Dia 1.6B TTS are available on GitHub.

Dia 1.6B TTS Audio Demos

Dia 1.6B TTS: Standard Usage (Sample 1)

Basic dialogue generation example from Dia 1.6B TTS.

Dia 1.6B TTS: Natural Conversation (Sample 2)

Demonstrates casual interactions using Dia 1.6B TTS.

Dia 1.6B TTS: Emotional Dialogue (Sample 3)

Expressive, high-emotion speech example using Dia 1.6B TTS.

Dia 1.6B TTS: Non-Verbal Sounds (Sample 4)

Includes coughs, sniffling, laughing generated by Dia 1.6B TTS.

Dia 1.6B TTS: Rap Example (Sample 5)

Showcases rhythm and rhyme using Dia 1.6B TTS.

Dia 1.6B TTS: Audio Prompting Feature (Sample 6)

Example of voice cloning using Dia 1.6B TTS audio prompts.

Note: To use audio prompts for high-quality output in Dia 1.6B TTS, prepend the corresponding script to your input text. Auto transcription is being considered for ease of use.

Dia 1.6B TTS Video Examples

Dia 1.6B TTS: Podcast Quality

Demonstrates the potential for podcast generation using Dia 1.6B TTS.

Dia 1.6B TTS: Model Introduction

Highlights the 1.6B parameter model of Dia 1.6B TTS.

Dia 1.6B TTS: Ultra-Realistic Dialogue

Showcases one-pass generation using Dia 1.6B TTS.

How Dia 1.6B TTS Works: From Text to Lifelike Dialogue

  1. 1. Prepare Your Script for Dia 1.6B TTS

    Write or paste the text you want Dia 1.6B TTS to convert. Use simple tags like [S1] and [S2] before sentences to assign different speaker voices. You can also include non-verbal cues like (laughs) or (coughs) to add realism.

  2. 2. (Optional) Provide Audio Prompts to Dia 1.6B TTS

    To clone a specific voice or guide emotional tone with Dia 1.6B TTS, upload a short audio sample (5-15 seconds) and its accurate transcription (with speaker tags) prepended to the main script in your input.

  3. 3. Generate Audio with Dia 1.6B TTS

    Run the Dia 1.6B TTS model (locally via the app or using the online demo). The model processes the entire script in one pass, generating seamless dialogue.

  4. 4. Listen and Download Dia 1.6B TTS Output

    Play the generated audio directly from Dia 1.6B TTS. The output captures natural intonation, rhythm, and even non-verbal cues, creating an ultra-realistic listening experience. Download the audio file for your projects.

Dia 1.6B TTS Installation Guide

### Windows Installation

1. Clone the repository
   git clone https://github.com/nari-labs/dia.git
   cd dia

2. Create a Python virtual environment (Python 3.10 recommended)
   python -m venv venv
   venv\Scripts\activate.bat

3. Install dependencies
   python -m pip install --upgrade pip
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
   pip install -r requirements.txt

4. Download model weights
   # These will download automatically or can be manually downloaded from Hugging Face

5. Launch the application
   python app.py

Dia 1.6B TTS Technical Information

Dia 1.6B TTS Architecture Diagram

Dia 1.6B TTS - Ultra-Realistic Dialogue Synthesis Model

Dia 1.6B TTS is a state-of-the-art text-to-speech model with 1.6B parameters that generates human-like voices with natural intonation, rhythm, and emotion. On enterprise-grade GPUs, Dia 1.6B TTS can generate audio in real-time, with an A4000 GPU producing approximately 40 tokens/second (86 tokens equal 1 second of audio).

The full version requires approximately 10GB of VRAM to run. Quantized versions of Dia 1.6B TTS are planned for future updates to improve accessibility on lower-end hardware.

Dia TTS Pricing

Purchase Dia TTS voice generation credits to experience professional AI text-to-speech services.

Basic

Annual Basic plan with better pricing.

$9.9$7.9/month
  • 12000 credits per year (1000/month)
  • Billed annually ($94.80/year)
  • High-quality audio outputs
  • Standard customer support
  • No ads

Annual savings! 20% off vs monthly!

Most Popular

Pro

Annual Pro plan, the best choice for professionals.

$19.9$15.9/month
  • 26400 credits per year (2200/month)
  • Billed annually ($190.80/year)
  • High-quality audio outputs
  • Priority customer support
  • No ads

Annual savings! 20% off vs monthly!

Ultra

Annual Ultra plan, perfect for teams and enterprises.

$36.9$29.9/month
  • 54000 credits per year (4500/month)
  • Billed annually ($358.80/year)
  • High-quality audio outputs
  • VIP customer support
  • No ads

Annual savings! 19% off vs monthly!