Dia 1.6B TTS

What is Dia 1.6B TTS?

Dia 1.6B TTS is a cutting-edge AI text-to-speech model designed for ultra-realistic dialogue synthesis. Developed by Nari Labs and released under the Apache 2.0 license, Dia 1.6B TTS delivers natural and expressive voice output that rivals commercial solutions.

Voice synthesis with natural intonation, rhythm, and emotional expression using Dia 1.6B TTS
Optimized for generating conversations with multiple speakers with Dia 1.6B TTS
1.6B parameter model running on 10GB VRAM
Voice cloning capabilities through audio prompts

Key Features of Dia 1.6B TTS

Superior Voice Quality with Dia 1.6B TTS

Dia 1.6B TTS produces incredibly natural-sounding voices with human-like intonation, rhythm, and emotion. The advanced AI models create speech that's virtually indistinguishable from human voices.

Dia 1.6B TTS: Multiple Speaker Support

Easily create dialogues with multiple speakers using simple tags like [S1] and [S2] to designate different voices in your text, maintaining consistent natural conversation with Dia 1.6B TTS.

Voice Cloning using Dia 1.6B TTS

Use the audio prompt feature to clone specific voice characteristics, enabling consistent voice identity across multiple generations for personalized voice outputs with Dia 1.6B TTS.

Dia 1.6B TTS: Open Source Model

Released under the Apache 2.0 license, allowing free use for both personal and commercial purposes. Complete model weights and code for Dia 1.6B TTS are available on GitHub.

Dia 1.6B TTS Audio Demos

Dia 1.6B TTS: Standard Usage (Sample 1)

Basic dialogue generation example from Dia 1.6B TTS.

Dia 1.6B TTS: Natural Conversation (Sample 2)

Demonstrating casual interaction with Dia 1.6B TTS.

Dia 1.6B TTS: Emotional Dialogue (Sample 3)

Example of expressive, high-emotion speech using Dia 1.6B TTS.

Dia 1.6B TTS: Non-Verbal Sounds (Sample 4)

Includes coughs, sniffs, laughs generated by Dia 1.6B TTS.

Dia 1.6B TTS: Rap Example (Sample 5)

Demonstrating rhythm and flow with Dia 1.6B TTS.

Dia 1.6B TTS: Audio Prompt Feature (Sample 6)

Example using audio prompts for voice cloning with Dia 1.6B TTS.

Note: For high-quality output with audio prompts in Dia 1.6B TTS, prepend the corresponding script to the input text. Automating transcription for easier usage is being considered.

Dia 1.6B TTS Video Examples

Dia 1.6B TTS: Podcast Quality

Showcasing potential for podcast generation using Dia 1.6B TTS.

Dia 1.6B TTS: Model Introduction

Highlighting the 1.6B parameter model of Dia 1.6B TTS.

Dia 1.6B TTS: Ultra-Realistic Dialogue

Demonstration of single-pass generation with Dia 1.6B TTS.

How Dia 1.6B TTS Works: From Text to Realistic Dialogue

1. Prepare Your Script for Dia 1.6B TTS

Write or paste the text you want Dia 1.6B TTS to convert. Use simple tags like [S1] and [S2] before sentences to assign different speaker voices. You can also include non-verbal cues like (laughs) or (coughs) for extra realism.
2. (Optional) Provide an Audio Prompt for Dia 1.6B TTS

To clone a specific voice or guide the emotional tone with Dia 1.6B TTS, upload a short audio sample (5-15 seconds) and prepend its exact transcript (with speaker tags) to your main script in the input.
3. Generate the Audio with Dia 1.6B TTS

Run the Dia 1.6B TTS model (either locally via the app or using the online demo). The model processes the entire script in one pass, generating a seamless dialogue.
4. Listen and Download Dia 1.6B TTS Output

Playback the generated audio directly from Dia 1.6B TTS. The output captures natural intonation, rhythm, and even the non-verbal cues, creating an ultra-realistic listening experience. Download the audio file for your projects.

Dia 1.6B TTS Installation Guide

### Windows Installation

1. Clone the repository
   git clone https://github.com/nari-labs/dia.git
   cd dia

2. Create a Python virtual environment (Python 3.10 recommended)
   python -m venv venv
   venv\Scripts\activate.bat

3. Install dependencies
   python -m pip install --upgrade pip
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
   pip install -r requirements.txt

4. Download model weights
   # These will download automatically or can be manually downloaded from Hugging Face

5. Launch the application
   python app.py

### Linux / macOS Installation
# Steps are generally identical for Linux and macOS.

# Ensure prerequisites are met: Python 3.8+, Git, CUDA-enabled GPU (for GPU usage).

# 1. Clone the repository
git clone https://github.com/nari-labs/dia.git
cd dia

# --- Option A (Recommended): Using uv ---
# uv handles virtual environments and dependencies automatically.
# Install uv if you haven't already: pip install uv
uv run app.py

# --- Option B (Manual): Using venv + pip ---
# If you prefer manual setup:

# 2. Create and activate a virtual environment (Python 3.10 recommended)
python -m venv .venv
source .venv/bin/activate

# 3. Install Dependencies
# (Ensure your virtual environment is active)
# Update pip
python -m pip install --upgrade pip

# Install PyTorch matching your CUDA version (Check https://pytorch.org/)
# Example for CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Example for CPU only (will be slow):
# pip install torch torchvision torchaudio

# Install other requirements (check pyproject.toml for exact list)
pip install -r requirements.txt

# 4. Launch the application
# (Ensure you are in the 'dia' directory and your environment is active)
python app.py

# --- Access the Interface ---
# Open your browser and navigate to: http://127.0.0.1:7860
# (Check terminal output for the exact URL)

### Using the Dia 1.6B TTS Online Demo

You can try Dia 1.6B TTS directly on Hugging Face Spaces:
https://huggingface.co/spaces/nari-labs/Dia-1.6B

1. Visit the page
2. Enter your text (with [S1], [S2], etc. tags to specify speakers)
3. Optionally upload an audio prompt
4. Click the generate button
5. Listen to and download the output audio

Dia 1.6B TTS Technical Information

Dia 1.6B TTS - Ultra-Realistic Dialogue Synthesis Model

Dia 1.6B TTS is a state-of-the-art text-to-speech model with 1.6B parameters that generates human-like voices with natural intonation, rhythm, and emotion. On enterprise-grade GPUs, Dia 1.6B TTS can generate audio in real-time, with an A4000 GPU producing approximately 40 tokens/second (with 86 tokens equaling 1 second of audio).

The full version requires approximately 10GB of VRAM to run. A quantized version of Dia 1.6B TTS is planned for future updates to improve accessibility on lower-end hardware.

GitHub Repository for Dia 1.6B TTS Online Demo of Dia 1.6B TTS

Ultra-Realistic AI Voice Model for Dialogue