3.42 prosody
Element type |
prosody |
Attributes |
contour | duration | pitch | range | rate | volume |
Parents |
audio | choice | emphasis | enumerate | paragraph | prompt | prosody | sentence | voice |
Children |
PCDATA | audio | break | emphasis | enumerate | mark | paragraph | phoneme | prosody | say-as | sentence | value | voice |
Description |
Associates text-to-speech rendering parameters with the contained tts content. |
DTD
<!ELEMENT prosody (%allowed-within-sentence; | %structure;)* > <!ATTLIST prosody pitch CDATA #IMPLIED contour CDATA #IMPLIED range CDATA #IMPLIED rate CDATA #IMPLIED duration CDATA #IMPLIED volume CDATA #IMPLIED >
Language model
Attributes
-
contour : CDATA
Indicates a transform function for the pitch variation. This can be used to "deaden" or "brighten" the pitch range in a more specific way than simply with a single constant as is done with the range attribute. This attribute is specified as a set of (input percentage, relative output percentage) pairs, for example: "(0%,+20)(10%,+30%)(40%,+10)".
-
duration : CDATA
The time interval in which the contained tts content must be spoken. This is specified using an integer followed by a unit abbreviation: s (second), ms (millisecond).
-
pitch : CDATA
The base line pitch for the speech specified either as an integer in Hertz, as a relative value (e.g. +10, +5%, +5st where st means “semitone,”), or as one of the following symbols: high, medium, low, default.
-
range : CDATA
The variability of pitch specified either in Hertz, as a relative value (e.g. +10, +5%, +5st where st means “semitone,”), or as one of the following symbols: high, medium, low, default.
-
rate : CDATA
The speaking rate for the text specified either as a relative value (e.g. +10, +5%, etc.), or as one of the following symbols: fast, medium, slow, default.
-
volume : CDATA
The volume at which the contained tts content should be played specified as an integer in the range of [0, 100] or one of the following values: silent, soft, medium, loud, default.
Children
-
tts and audible content
To be spoken using the given prosody parameters.
Examples
Example 3-53 A dialog that sings Twinkle Twinkle Little Star
<?xml version="1.0" encoding="iso-8859-1"?> <vxml version="2.0"> <form id="audiotest"> <block> <prompt xml:lang="us-en"> <prosody duration="500ms" pitch="440">Twin</prosody> <prosody duration="500ms" pitch="440">cull</prosody> <prosody duration="500ms" pitch="659">Twin</prosody> <prosody duration="500ms" pitch="659">cull</prosody> <prosody duration="500ms" pitch="740">Lit</prosody> <prosody duration="500ms" pitch="740">tull</prosody> <prosody duration="1000ms" pitch="659">star</prosody> <prosody duration="500ms" pitch="587">How</prosody> <prosody duration="500ms" pitch="587">I</prosody> <prosody duration="500ms" pitch="554">Won</prosody> <prosody duration="500ms" pitch="554">der</prosody> <prosody duration="500ms" pitch="494">what</prosody> <prosody duration="500ms" pitch="494">you</prosody> <prosody duration="1000ms" pitch="440">are</prosody> </prompt> </block> </form> </vxml>