2024-12-06 Web Development

Setting up Google Cloud Text to Speech App

By O. Wolfson

This article will guide you through the process of setting up a simple TTS web app using Google Cloud Text-to-Speech (TTS) and Next.js. We'll include server-side integration, client-side rendering, and Google Cloud API configuration.

Prerequisites

Node.js and npm installed on your system.
Google Cloud Project:
- Create a project on the Google Cloud Console.
- Enable the Text-to-Speech API.
- Generate and download a service account key (JSON file).

Next.js Project:

Initialize a Next.js project:

bash
npx create-next-app@latest my-tts-app
cd my-tts-app

Install necessary dependencies:

bash
npm install @google-cloud/text-to-speech

Step 1: Configure Google Cloud Text-to-Speech API

Service Account Key: Place the downloaded service account key file (e.g., tts-key.json) in a secure location in your project directory, such as ./keys.

Environment Variables: Securely reference your key file using environment variables if deploying:

Add the path to .env.local:

env
GOOGLE_APPLICATION_CREDENTIALS=./keys/tts-key.json

Access the variable in the code:

javascript
process.env.GOOGLE_APPLICATION_CREDENTIALS;

Step 2: Build the Server-Side TTS API

We’ll use a server action in Next.js to handle API requests.

Create a file at app/api/tts/route.js:

javascript
import { TextToSpeechClient } from "@google-cloud/text-to-speech";

const client = new TextToSpeechClient();

export async function POST(request) {
  try {
    const { text } = await request.json();

    if (!text) {
      return new Response(JSON.stringify({ error: "Text is required" }), {
        status: 400,
        headers: { "Content-Type": "application/json" },
      });
    }

    const requestPayload = {
      input: { text },
      voice: {
        languageCode: "en-US",
        name: "en-US-Neural2-D",
      },
      audioConfig: { audioEncoding: "MP3" },
    };

    const [response] = await client.synthesizeSpeech(requestPayload);

    return new Response(response.audioContent, {
      status: 200,
      headers: {
        "Content-Type": "audio/mpeg",
        "Content-Disposition": "inline; filename=speech.mp3",
      },
    });
  } catch (error) {
    console.error("Error synthesizing speech:", error);
    return new Response(
      JSON.stringify({ error: "Failed to generate speech" }),
      { status: 500, headers: { "Content-Type": "application/json" } }
    );
  }
}

Test the API locally: Start the Next.js development server:

bash
npm run dev

Send a POST request to /api/tts using a tool like Postman or curl with a JSON body:

json
{ "text": "Hello, world!" }

Step 3: Build the Client-Side UI

Create a React component for interacting with the API.

In app/page.js:

javascript
"use client";

import { useState } from "react";

export default function TextToSpeechPage() {
  const [text, setText] = useState("This is a test sentence.");
  const [isLoading, setIsLoading] = useState(false);
  const [audioUrl, setAudioUrl] = useState(null);

  const handleGenerateSpeech = async () => {
    setIsLoading(true);
    setAudioUrl(null);

    try {
      const response = await fetch("/api/tts", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ text }),
      });

      if (!response.ok) {
        throw new Error("Failed to generate speech.");
      }

      const audioBlob = await response.blob();
      const audioUrl = URL.createObjectURL(audioBlob);
      setAudioUrl(audioUrl);
    } catch (error) {
      console.error("Error:", error);
    } finally {
      setIsLoading(false);
    }
  };

  return (
    <div style={{ padding: "20px", textAlign: "center" }}>
      <h1>Text-to-Speech</h1>
      <textarea
        value={text}
        onChange={(e) => setText(e.target.value)}
        rows={4}
        cols={40}
        style={{ display: "block", margin: "10px auto" }}
      />
      <button
        type="button"
        onClick={handleGenerateSpeech}
        disabled={isLoading}
        style={{ padding: "10px 20px", fontSize: "16px" }}
      >
        {isLoading ? "Generating..." : "Generate Speech"}
      </button>
      {audioUrl && (
        <audio controls src={audioUrl} style={{ marginTop: "20px" }} />
      )}
    </div>
  );
}

Step 4: Deploy Your Application

Deploy your Next.js app to a platform like Vercel or Netlify.
If using Vercel, ensure you upload the tts-key.json file securely or replace it with an environment variable setup.

Summary

With this foundation, you can extend the application by:

Supporting multiple languages and voices.
Adding audio file downloads.
Including user authentication for personalized services.