2024-12-06 Web Development

Setting up Google Cloud Text to Speech App

By O. Wolfson

This article will guide you through the process of setting up a simple TTS web app using Google Cloud Text-to-Speech (TTS) and Next.js. We'll include server-side integration, client-side rendering, and Google Cloud API configuration.

Prerequisites

  1. Node.js and npm installed on your system.

  2. Google Cloud Project:

    • Create a project on the Google Cloud Console.
    • Enable the Text-to-Speech API.
    • Generate and download a service account key (JSON file).
  3. Next.js Project:

    • Initialize a Next.js project:

      bash
      npx create-next-app@latest my-tts-app
      cd my-tts-app
      
    • Install necessary dependencies:

      bash
      npm install @google-cloud/text-to-speech
      

Step 1: Configure Google Cloud Text-to-Speech API

  1. Service Account Key: Place the downloaded service account key file (e.g., tts-key.json) in a secure location in your project directory, such as ./keys.

  2. Environment Variables: Securely reference your key file using environment variables if deploying:

    • Add the path to .env.local:

      env
      GOOGLE_APPLICATION_CREDENTIALS=./keys/tts-key.json
      
    • Access the variable in the code:

      javascript
      process.env.GOOGLE_APPLICATION_CREDENTIALS;
      

Step 2: Build the Server-Side TTS API

We’ll use a server action in Next.js to handle API requests.

  1. Create a file at app/api/tts/route.js:

    javascript
    import { TextToSpeechClient } from "@google-cloud/text-to-speech";
    
    const client = new TextToSpeechClient();
    
    export async function POST(request) {
      try {
        const { text } = await request.json();
    
        if (!text) {
          return new Response(JSON.stringify({ error: "Text is required" }), {
            status: 400,
            headers: { "Content-Type": "application/json" },
          });
        }
    
        const requestPayload = {
          input: { text },
          voice: {
            languageCode: "en-US",
            name: "en-US-Neural2-D",
          },
          audioConfig: { audioEncoding: "MP3" },
        };
    
        const [response] = await client.synthesizeSpeech(requestPayload);
    
        return new Response(response.audioContent, {
          status: 200,
          headers: {
            "Content-Type": "audio/mpeg",
            "Content-Disposition": "inline; filename=speech.mp3",
          },
        });
      } catch (error) {
        console.error("Error synthesizing speech:", error);
        return new Response(
          JSON.stringify({ error: "Failed to generate speech" }),
          { status: 500, headers: { "Content-Type": "application/json" } }
        );
      }
    }
    
  2. Test the API locally: Start the Next.js development server:

    bash
    npm run dev
    

    Send a POST request to /api/tts using a tool like Postman or curl with a JSON body:

    json
    { "text": "Hello, world!" }
    

Step 3: Build the Client-Side UI

Create a React component for interacting with the API.

  1. In app/page.js:

    javascript
    "use client";
    
    import { useState } from "react";
    
    export default function TextToSpeechPage() {
      const [text, setText] = useState("This is a test sentence.");
      const [isLoading, setIsLoading] = useState(false);
      const [audioUrl, setAudioUrl] = useState(null);
    
      const handleGenerateSpeech = async () => {
        setIsLoading(true);
        setAudioUrl(null);
    
        try {
          const response = await fetch("/api/tts", {
            method: "POST",
            headers: { "Content-Type": "application/json" },
            body: JSON.stringify({ text }),
          });
    
          if (!response.ok) {
            throw new Error("Failed to generate speech.");
          }
    
          const audioBlob = await response.blob();
          const audioUrl = URL.createObjectURL(audioBlob);
          setAudioUrl(audioUrl);
        } catch (error) {
          console.error("Error:", error);
        } finally {
          setIsLoading(false);
        }
      };
    
      return (
        <div style={{ padding: "20px", textAlign: "center" }}>
          <h1>Text-to-Speech</h1>
          <textarea
            value={text}
            onChange={(e) => setText(e.target.value)}
            rows={4}
            cols={40}
            style={{ display: "block", margin: "10px auto" }}
          />
          <button
            type="button"
            onClick={handleGenerateSpeech}
            disabled={isLoading}
            style={{ padding: "10px 20px", fontSize: "16px" }}
          >
            {isLoading ? "Generating..." : "Generate Speech"}
          </button>
          {audioUrl && (
            <audio controls src={audioUrl} style={{ marginTop: "20px" }} />
          )}
        </div>
      );
    }
    

Step 4: Deploy Your Application

  1. Deploy your Next.js app to a platform like Vercel or Netlify.
  2. If using Vercel, ensure you upload the tts-key.json file securely or replace it with an environment variable setup.

Summary

With this foundation, you can extend the application by:

  • Supporting multiple languages and voices.
  • Adding audio file downloads.
  • Including user authentication for personalized services.