2025-03-09 Programming, Technology, Productivity
How to Download, Train, and Run Coqui TTS on a Mac (Text-to-Speech with Custom Voice)
By O. Wolfson
Coqui TTS is an open-source text-to-speech (TTS) system that allows you to generate speech from text. You can also train it with your own voice to create a personalized TTS model. This guide covers:
- Installing Coqui TTS on a Mac (Apple Silicon and Intel)
- Running pre-trained models
- Training a custom voice model
- Generating speech from text locally
1. Install Coqui TTS on a Mac
1.1 Prerequisites
Ensure you have the following installed:
- Python 3.8+ (recommended 3.10)
- Homebrew (for package management)
- ffmpeg (for audio processing)
- PyTorch with Metal support (for Apple Silicon GPUs)
Step 1: Install Homebrew (if not installed)
Step 2: Install Dependencies
Step 3: Set Up a Virtual Environment
Using a virtual environment ensures that all packages are installed in an isolated directory, preventing conflicts with system-wide dependencies.
-
Create a virtual environment:
-
Activate the virtual environment:
- On macOS/Linux:
- On Windows: (if applicable)
- On macOS/Linux:
Once activated, your shell prompt may change, indicating you are inside the virtual environment.
Step 4: Install PyTorch (for Apple Silicon Macs)
Step 5: Verify GPU Support (Apple Silicon only)
Run the following command to check if Metal (MPS) is available:
Step 6: Install Coqui TTS
Step 7: Verify Installation
Check available models:
To exit the virtual environment, run:
2. Running a Pre-Trained Model (Quick Test)
Ensure your virtual environment is activated before running:
Run a basic TTS model to check if everything works:
This will generate a speech WAV file using a built-in model.
3. Training Coqui TTS with Your Own Voice
3.1 Prepare Your Voice Dataset
You need:
- 1–5 hours of high-quality recordings (WAV format, preferably 22kHz or 44kHz).
- A transcript (CSV or JSON) matching the speech.
Dataset Folder Structure
Example metadata.csv format
3.2 Train the Model
Run the training command:
For fine-tuning an existing model:
For Apple Silicon Macs, enable GPU acceleration (MPS):
- Open
your_config.json
and change:
- Start training:
4. Generating Speech from a Trained Model
Ensure your virtual environment is activated before running:
Once the model is trained, you can generate speech files:
For batch processing multiple sentences, use Python:
To play the audio on Mac:
5. Deploying a Local TTS API
You can turn Coqui TTS into an API to generate speech via HTTP requests.
Step 1: Install FastAPI & Uvicorn
Step 2: Create server.py
Step 3: Run the API
Step 4: Test the API
This will generate a WAV file and return its path.
6. Deploying to the Cloud
If you need cloud deployment, you can:
- Use Google Colab for training (free GPU access)
- Deploy on RunPod.io / Lambda Labs for cheap GPU rentals
- Use AWS / GCP for production-grade hosting
- Host a web app using Hugging Face Spaces
Conclusion
Coqui TTS allows you to train and run a text-to-speech model on a Mac, including custom voice training. Apple Silicon Macs can leverage MPS acceleration, but if training is too slow, cloud GPUs are an option.
With this setup, you can generate custom TTS audio files, deploy a local API, or even build your own AI voice assistant.