An open-source 1.6B parameter text-to-speech model developed by Nari Labs that generates human-like voices with natural intonation, rhythm, and emotion. Meet Dia 1.6B TTS.
Loading... 3s
Dia 1.6B TTS is a cutting-edge AI text-to-speech model designed for ultra-realistic dialogue synthesis. Developed by Nari Labs and released under the Apache 2.0 license, Dia 1.6B TTS delivers natural and expressive voice output that rivals commercial solutions.
Dia 1.6B TTS produces incredibly natural-sounding voices with human-like intonation, rhythm, and emotion. The advanced AI models create speech that's virtually indistinguishable from human voices.
Easily create dialogues with multiple speakers using simple tags like [S1] and [S2] to designate different voices in your text, maintaining consistent natural conversation with Dia 1.6B TTS.
Use the audio prompt feature to clone specific voice characteristics, enabling consistent voice identity across multiple generations for personalized voice outputs with Dia 1.6B TTS.
Released under the Apache 2.0 license, allowing free use for both personal and commercial purposes. Complete model weights and code for Dia 1.6B TTS are available on GitHub.
Basic dialogue generation example from Dia 1.6B TTS.
Demonstrating casual interaction with Dia 1.6B TTS.
Example of expressive, high-emotion speech using Dia 1.6B TTS.
Includes coughs, sniffs, laughs generated by Dia 1.6B TTS.
Demonstrating rhythm and flow with Dia 1.6B TTS.
Example using audio prompts for voice cloning with Dia 1.6B TTS.
Note: For high-quality output with audio prompts in Dia 1.6B TTS, prepend the corresponding script to the input text. Automating transcription for easier usage is being considered.
Showcasing potential for podcast generation using Dia 1.6B TTS.
Highlighting the 1.6B parameter model of Dia 1.6B TTS.
Demonstration of single-pass generation with Dia 1.6B TTS.
Write or paste the text you want Dia 1.6B TTS to convert. Use simple tags like [S1]
and [S2]
before sentences to assign different speaker voices. You can also include non-verbal cues like (laughs)
or (coughs)
for extra realism.
To clone a specific voice or guide the emotional tone with Dia 1.6B TTS, upload a short audio sample (5-15 seconds) and prepend its exact transcript (with speaker tags) to your main script in the input.
Run the Dia 1.6B TTS model (either locally via the app or using the online demo). The model processes the entire script in one pass, generating a seamless dialogue.
Playback the generated audio directly from Dia 1.6B TTS. The output captures natural intonation, rhythm, and even the non-verbal cues, creating an ultra-realistic listening experience. Download the audio file for your projects.
### Windows Installation
1. Clone the repository
git clone https://github.com/nari-labs/dia.git
cd dia
2. Create a Python virtual environment (Python 3.10 recommended)
python -m venv venv
venv\Scripts\activate.bat
3. Install dependencies
python -m pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
4. Download model weights
# These will download automatically or can be manually downloaded from Hugging Face
5. Launch the application
python app.py
### Linux / macOS Installation
# Steps are generally identical for Linux and macOS.
# Ensure prerequisites are met: Python 3.8+, Git, CUDA-enabled GPU (for GPU usage).
# 1. Clone the repository
git clone https://github.com/nari-labs/dia.git
cd dia
# --- Option A (Recommended): Using uv ---
# uv handles virtual environments and dependencies automatically.
# Install uv if you haven't already: pip install uv
uv run app.py
# --- Option B (Manual): Using venv + pip ---
# If you prefer manual setup:
# 2. Create and activate a virtual environment (Python 3.10 recommended)
python -m venv .venv
source .venv/bin/activate
# 3. Install Dependencies
# (Ensure your virtual environment is active)
# Update pip
python -m pip install --upgrade pip
# Install PyTorch matching your CUDA version (Check https://pytorch.org/)
# Example for CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Example for CPU only (will be slow):
# pip install torch torchvision torchaudio
# Install other requirements (check pyproject.toml for exact list)
pip install -r requirements.txt
# 4. Launch the application
# (Ensure you are in the 'dia' directory and your environment is active)
python app.py
# --- Access the Interface ---
# Open your browser and navigate to: http://127.0.0.1:7860
# (Check terminal output for the exact URL)
### Using the Dia 1.6B TTS Online Demo
You can try Dia 1.6B TTS directly on Hugging Face Spaces:
https://huggingface.co/spaces/nari-labs/Dia-1.6B
1. Visit the page
2. Enter your text (with [S1], [S2], etc. tags to specify speakers)
3. Optionally upload an audio prompt
4. Click the generate button
5. Listen to and download the output audio
Dia 1.6B TTS is a state-of-the-art text-to-speech model with 1.6B parameters that generates human-like voices with natural intonation, rhythm, and emotion. On enterprise-grade GPUs, Dia 1.6B TTS can generate audio in real-time, with an A4000 GPU producing approximately 40 tokens/second (with 86 tokens equaling 1 second of audio).
The full version requires approximately 10GB of VRAM to run. A quantized version of Dia 1.6B TTS is planned for future updates to improve accessibility on lower-end hardware.