2024-09-30 Web Development

How to Convert Text to Speech Using Google Cloud Text-to-Speech API

By O Wolfson

Google Cloud Text-to-Speech API allows developers to synthesize natural-sounding speech from text. This guide will walk you through the process of setting up the API, obtaining the necessary credentials, and writing a Node.js script to convert text to speech.

Step 1: Set Up a Google Cloud Project

  1. Create a Google Cloud Project:

    • Go to the Google Cloud Console.
    • Click on the project dropdown at the top of the page and select "New Project."
    • Enter a name for your project and click "Create."
  2. Enable the Text-to-Speech API:

    • Once your project is created, navigate to the Text-to-Speech API page.
    • Click "Enable" to enable the API for your project.

Step 2: Set Up Service Account Credentials

  1. Create a Service Account:

    • In the Google Cloud Console, go to the Service Accounts page.
    • Click "Create Service Account."
    • Enter a name and description for your service account, then click "Create."
  2. Grant the Service Account Access:

    • On the next screen, select the "Text-to-Speech API User" role from the dropdown.
    • Click "Continue" and then "Done."
  3. Create a Key for the Service Account:

    • Click on the newly created service account to open its details.
    • Go to the "Keys" tab and click "Add Key" -> "Create New Key."
    • Choose the JSON key type and click "Create."
    • Save the JSON file to a secure location on your computer.

Step 3: Write the Node.js Script

Install the required Node.js packages:

bash
npm install @google-cloud/text-to-speech

Create a script (synthesize.js) with the following content:

javascript
const textToSpeech = require("@google-cloud/text-to-speech");
const fs = require("node:fs");
const util = require("node:util");

// Initialize the Text-to-Speech client with the service account key file
const client = new textToSpeech.TextToSpeechClient({
  keyFilename: "./tts-key.json",
});

// Function to synthesize speech from text and save it to an MP3 file
async function synthesizeSpeech(text, outputFile) {
  // Define the request payload
  const request = {
    input: { text: text },
    voice: {
      languageCode: "en-US",
      name: "en-US-Neural2-D",
    },
    audioConfig: { audioEncoding: "MP3" },
  };

  // Make the API request to synthesize speech
  const [response] = await client.synthesizeSpeech(request);
  // Write the audio content to a file
  const writeFile = util.promisify(fs.writeFile);
  await writeFile(outputFile, response.audioContent, "binary");
  console.log(`Audio content written to file: ${outputFile}`);
}

// Sample text to convert to speech
const text = `This is a generic sentence intended for testing text-to-speech.`;
// Output file path
const outputFile = "output.mp3";
// Call the function to synthesize speech
synthesizeSpeech(text, outputFile);

In this script:

  • We initialize the Text-to-Speech client using the service account key file.
  • We define a function synthesizeSpeech that takes text and an output file path as arguments.
  • The function makes a request to the Text-to-Speech API to synthesize speech and saves the audio content to an MP3 file.

Step 4: Run the Script

To run the script, execute the following command in your terminal:

bash
node synthesize.js

If everything is set up correctly, you should see the message "Audio content written to file: output.mp3" and an MP3 file will be generated with the synthesized speech.

Voice and Language Options

The Google Cloud Text-to-Speech API provides a variety of voices and languages to choose from. Here are some of the available options:

English (United States) Neural2 Voices

  • en-US-Neural2-A (Female)
  • en-US-Neural2-B (Male)
  • en-US-Neural2-C (Female)
  • en-US-Neural2-D (Male)
  • en-US-Neural2-E (Female)
  • en-US-Neural2-F (Male)
  • en-US-Neural2-G (Female)
  • en-US-Neural2-H (Male)
  • en-US-Neural2-I (Female)
  • en-US-Neural2-J (Male)

English (United States) WaveNet Voices

  • en-US-Wavenet-A (Female)
  • en-US-Wavenet-B (Male)
  • en-US-Wavenet-C (Female)
  • en-US-Wavenet-D (Male)
  • en-US-Wavenet-E (Male)
  • en-US-Wavenet-F (Female)
  • en-US-Wavenet-G (Male)
  • en-US-Wavenet-H (Female)

English (United Kingdom) Neural2 Voices

  • en-GB-Neural2-A (Female)
  • en-GB-Neural2-B (Male)
  • en-GB-Neural2-C (Female)
  • en-GB-Neural2-D (Male)

English (United Kingdom) WaveNet Voices

  • en-GB-Wavenet-A (Female)
  • en-GB-Wavenet-B (Male)
  • en-GB-Wavenet-C (Female)
  • en-GB-Wavenet-D (Male)

English (Australian) Neural2 Voices

  • en-AU-Neural2-A (Female)
  • en-AU-Neural2-B (Male)
  • en-AU-Neural2-C (Female)
  • en-AU-Neural2-D (Male)

English (Australian) WaveNet Voices

  • en-AU-Wavenet-A (Female)
  • en-AU-Wavenet-B (Male)
  • en-AU-Wavenet-C (Female)
  • en-AU-Wavenet-D (Male)

English (Indian) Neural2 Voices

  • en-IN-Neural2-A (Female)
  • en-IN-Neural2-B (Male)
  • en-IN-Neural2-C (Female)
  • en-IN-Neural2-D (Male)

English (Indian) WaveNet Voices

  • en-IN-Wavenet-A (Female)
  • en-IN-Wavenet-B (Male)
  • en-IN-Wavenet-C (Female)
  • en-IN-Wavenet-D (Male)

Pricing

Google Cloud Text-to-Speech API offers a flexible pricing structure based on the number of characters synthesized per month. Here’s an overview of the costs:

  1. Free Tier:

    • First 1 million characters each month for WaveNet voices are free.
  2. Paid Usage:

    • Standard voices: $4.00 per 1 million characters.
    • WaveNet voices: $16.00 per 1 million characters.
    • Neural2 voices: $16.00 per 1 million characters.
    • Studio voices: $160.00 per 1 million characters.

New users also get $300 in free credits for the first 90 days to explore Google Cloud services.

For more details, visit the Google Cloud Pricing page.

Conclusion

Using the Google Cloud Text-to-Speech API, you can easily convert text to natural-sounding speech in various languages and voices. This guide walked you through the process of setting up the API, obtaining credentials, and writing a Node.js script to synthesize speech. You can now integrate this functionality into your applications for enhanced user interactions.


This article should provide a comprehensive guide for setting up and using the Google Cloud Text-to-Speech API.

This web app may use cookies to enhance the user experience. We do not share, sell, rent, or trade your personal information with any third parties. For more information, please see our privacy policy.