2024-09-30 Web Development
Speech Synthesis Markup Language for Google Text-to-Speech
By O Wolfson
Google's Text-to-Speech (TTS) service supports a variety of SSML (Speech Synthesis Markup Language) tags that allow you to control the pronunciation, pitch, rate, volume, and other aspects of speech synthesis. Below is an overview of the key SSML tags and attributes supported by Google TTS, along with examples of how to use them:
1. <speak>
- The root element that encapsulates the entire SSML content.
2. <emphasis>
- Used to apply emphasis to certain words or phrases.
level
: Can be "strong", "moderate", or "reduced".
3. <break>
- Introduces a pause in speech.
time
: Specifies the duration of the pause (e.g., "500ms").strength
: Specifies the strength of the pause ("none", "x-weak", "weak", "medium", "strong", "x-strong").
4. <prosody>
- Modifies the pitch, rate, and volume of the speech.
pitch
: Changes the pitch of the speech (e.g., "+10%", "high", "low").rate
: Changes the speed of the speech (e.g., "slow", "fast", "medium", "x-slow", "x-fast").volume
: Adjusts the volume (e.g., "soft", "loud", "x-loud", "-10dB").
5. <say-as>
- Defines how certain types of text should be interpreted (e.g., dates, times, numbers).
interpret-as
: Can be "date", "time", "telephone", "characters", "fraction", etc.
6. <sub>
- Substitutes an alternate string for the text in the tag.
7. <audio>
- Embeds an audio file within the speech synthesis.
src
: URL of the audio file.
8. <p> and <s>
<p>
is used to define a paragraph, and<s>
is used to define a sentence.
9. <voice>
- Allows you to select a specific voice for a portion of the text.
name
: The name of the voice to use.
10. <lang>
- Changes the language for a specific section of the text.
xml:lang
: The language code (e.g., "en-US", "fr-FR").
Practical Example Combining Tags:
Documentation Reference:
For the most up-to-date and detailed information on the supported SSML tags and their usage, you can refer to the Google Cloud Text-to-Speech SSML documentation.