PreviousNextIndex
Nodes and Palette Options > Detailed Palette Option Descriptions > Prosody
 
Prosody

Type

SSML item

Available from

Any level tab in the Prompt File Editor. For more information about the Prompt File Editor, see Using the Prompt File Editor.

Purpose

The Prosody item is a Synthesized Speech Markup Language (SSML) element that makes it possible to control various aspects of Text-to-Speech (TTS) synthesis. This element means you can get more natural-sounding speech synthesis from the TTS engine.

Behavior

Based on the properties you set, the Prosody item alters the rendering of TTS speech synthesis. All properties are optional, but if you use the Prosody item, you must use at least one property. The Prosody item has six basic properties:

Note:

The specific effects of many of these properties might vary from one TTS engine to another.

For additional details on the properties and how they behave, see the next section, "Properties."

Note:

The Prosody item and its properties function correctly only with SSML-compliant speech synthesis engines. The Microsoft Speech SDK, which is used by Dialog Designer during application simulation, is not an SSML-compliant speech synthesis engine, so any settings you make with this item are ignored. For more information about the SSML standard, see the Speech Synthesis Markup Language (SSML) Version 1.0 W3C Recommendation.

Properties

All properties are optional, but if you use the Prosody item, you must use at least one property. Note that all units, such as Hz and st, are case-sensitive.

Pitch Setting

Description

default

This setting has no effect on the output. This setting uses the default baseline pitch of the TTS server.

x-low

low

medium

high

x-high

These settings represent a range of pitch options. The specific application of these properties varies according to the TTS server.

custom

With this setting, you can define the baseline pitch that you want, as a modification of the TTS server default. When you select this setting, Dialog Designer automatically adds the Custom Pitch [Hz or st] property to the Property view.

Custom Pitch [Hz or st]

This setting is available only when the custom setting for Pitch is selected. With it, you can fine tune the baseline pitch by raising or lowering the pitch relative to the server default:

  • To raise or lower the pitch based on frequency, enter a number followed by Hz. For example, if you set this to +8000Hz the system raises the baseline pitch by 8000 cycles per second. If you set this to -500Hz, the system lowers the pitch by 500 cycles per second.
  • To raise or lower the pitch by semitones, enter a positive or negative number followed by st. Each whole number represents a semitone on the diatonic scale. Positive numbers raise the pitch. Negative numbers lower the pitch. For example, if you enter +3st in this field, the baseline pitch is raised by three semitones. If you enter -1.5st, the baseline pitch is lowered by one and a half semitones.

The correct format to enter these values is a positive (+) or negative (-) sign, followed by a number, followed by Hz or st, with no spaces. Numbers can be of the format "n", "n.", ".n" or "n.n", where n represents any sequence of one or more digits.


Range Setting

Description

default

This setting has no effect on the output. This setting uses the default pitch range of the TTS server.

x-low

low

medium

high

x-high

These settings represent a range of pitch range options. The specific application of these properties varies according to the TTS server.

custom

With this setting, you can define the pitch range that you want, as a modification of the TTS server default. When you select this setting, Dialog Designer automatically adds the Custom Range [Hz or st] property to the Property view.

Custom Range [Hz or st]

This setting is available only when the custom setting for Range is selected. With it, you can fine tune the pitch range by increasing or decreasing the range relative to the server default:

  • To increase or decrease the pitch range based on frequency, enter a number followed by Hz. For example, if you set this to +8000Hz the system increases the pitch range by 8000 cycles per second. If you set this to -500Hz, the system lowers the pitch range by 500 cycles per second.
  • To increase or decrease the pitch by semitones, enter a positive or negative number followed by st. Each whole number represents a semitone on the diatonic scale. Positive numbers increase the pitch range. Negative numbers decrease the pitch range. For example, if you enter +3st in this field, the pitch range is increased by three semitones. If you enter -1.5st, the baseline pitch is decreased by one and a half semitones.

The correct format to enter these values is a positive (+) or negative (-) sign, followed by a number, followed by Hz or st, with no spaces. Numbers can be of the format "n", "n.", ".n" or "n.n", where n represents any sequence of one or more digits.


Rate Setting

Description

default

This setting has no effect on the output. This setting uses the default speaking rate of the TTS server.

x-slow

slow

medium

fast

x-fast

These settings represent a range of speaking rate options. The specific application of these properties varies according to the TTS server.

custom

With this setting, you can define the speaking rate that you want, as a modification of the TTS server default. When you select this setting, Dialog Designer automatically adds the Custom Rate [positive float] property to the Property view.

Custom Rate [positive float]

This setting is available only when the custom setting for Rate is selected. With it, you can fine tune the speaking rate by increasing or decreasing the rate relative to the server default. The number you enter in this field acts as a multiplier on the default rate. For example, a setting of 1 or 100% in this field means there is no change to the default rate. A setting of 2 or 200% in this field makes the speaking rate twice as fast as the default rate. A setting of 0.5 or 50% in this field makes the speaking rate half as fast as the default rate.

The correct format to enter these values is "n", "n.", ".n" or "n.n", where n represents any sequence of one or more digits.


Duration Setting

Description

250ms

500ms

750ms

1s

2s

3s

4s

5s

These settings represent a range of duration options in milliseconds (ms) or seconds (s).

This setting overrides any Rate setting you might have.

custom

With this setting, you can define the exact duration that you want. When you select this setting, Dialog Designer automatically adds the Custom Duration [s or ms] property to the Property view.

Custom Duration  [s or ms]

This setting is available only when the custom setting for Duration is selected. With it, you can set the exact duration for the text to be spoken.

The correct format to enter these values is "n", "n.", ".n" or "n.n", where n represents any sequence of one or more digits. The number must be followed, with no space in between, by s for seconds or ms for milliseconds.


Note:

The Volume property uses a range of 0.0 (silent) to 100.0 (full volume).

Volume Setting

Description

default

This setting is the same as setting the volume to 100.0, or full volume.

silent

This setting is the same as setting the volume to 0.0.

x-soft

soft

medium

loud

x-loud

These settings represent a range of volume options. The specific application of these properties varies according to the TTS server.

custom

With this setting, you can define the exact volume that you want. When you select this setting, Dialog Designer automatically adds the Custom Volume [float] property to the Property view.

Custom Volume [float]

This setting is available only when the custom setting for Volume is selected. With it, you can set the exact volume you want for the text to be spoken.

The correct format to enter these values is a positive (+) or negative (-) sign, followed by a number. Numbers can be of the format "n", "n.", ".n" or "n.n", where n represents any sequence of one or more digits.



PreviousNextIndex

©2009, Avaya Inc. All rights reserved.