Struct VoiceAttributes

Source

pub struct VoiceAttributes {
    pub gender: Option<Gender>,
    pub age: Option<u8>,
    pub variant: Option<NonZeroUsize>,
    pub name: Vec<String>,
    pub languages: Vec<LanguageAccentPair>,
}

Expand description

The voice element is a production element that requests a change in speaking voice. There are two kinds of attributes for the voice element: those that indicate desired features of a voice and those that control behavior. The voice feature attributes are:

gender: optional attribute indicating the preferred gender of the voice to speak the contained text. Enumerated values are: “male”, “female”, “neutral”, or the empty string “”.
age: optional attribute indicating the preferred age in years (since birth) of the voice to speak the contained text. Acceptable values are of type xsd:nonNegativeInteger [SCHEMA2 §3.3.20] or the empty string “”.
variant: optional attribute indicating a preferred variant of the other voice characteristics to speak the contained text. (e.g. the second male child voice). Valid values of variant are of type xsd:positiveInteger [SCHEMA2 §3.3.25] or the empty string “”.
name: optional attribute indicating a processor-specific voice name to speak the contained text. The value may be a space-separated list of names ordered from top preference down or the empty string “”. As a result a name must not contain any white space.
languages: optional attribute indicating the list of languages the voice is desired to speak. The value must be either the empty string “” or a space-separated list of languages, with optional accent indication per language. Each language/accent pair is of the form “language” or “language:accent”, where both language and accent must be an Extended Language Range [BCP47, Matching of Language Tags §2.2], except that the values “und” and “zxx” are disallowed. A voice satisfies the languages feature if, for each language/accent pair in the list,
1. the voice is documented (see Voice descriptions) as reading/speaking a language that matches the Extended Language Range given by language according to the Extended Filtering matching algorithm [BCP47, Matching of Language Tags §3.3.2], and
2. if an accent is given, the voice is documented (see Voice descriptions) as reading/speaking the language above with an accent that matches the Extended Language Range given by accent according to the Extended Filtering matching algorithm [BCP47, Matching of Language Tags §3.3.2], except that the script and extension subtags of the accent must be ignored by the synthesis processor. It is recommended that authors and voice providers do not use the script or extension subtags for accents because they are not relevant for speaking.

For example, a languages value of “en:pt fr:ja” can legally be matched by any voice that can both read English (speaking it with a Portuguese accent) and read French (speaking it with a Japanese accent). Thus, a voice that only supports “en-US” with a “pt-BR” accent and “fr-CA” with a “ja” accent would match. As another example, if we have and there is no voice that supports French with a Portuguese accent, then a voice selection failure will occur. Note that if no accent indication is given for a language, then any voice that speaks the language is acceptable, regardless of accent. Also, note that author control over language support during voice selection is independent of any value of xml:lang in the text.

For the feature attributes above, an empty string value indicates that any voice will satisfy the feature. The top-level default value for all feature attributes is “”, the empty string.

The behavior control attributes of voice are:

required: optional attribute that specifies a set of features by their respective attribute names. This set of features is used by the voice selection algorithm described below. Valid values of required are a space-separated list composed of values from the list of feature names: “name”, “languages”, “gender”, “age”, “variant” or the empty string “”. The default value for this attribute is “languages”.
ordering: optional attribute that specifies the priority ordering of features. Valid values of ordering are a space-separated list composed of values from the list of feature names: “name”, “languages”, “gender”, “age”, “variant” or the empty string “”, where features named earlier in the list have higher priority . The default value for this attribute is “languages”. Features not listed in the ordering list have equal priority to each other but lower than that of the last feature in the list. Note that if the ordering attribute is set to the empty string then all features have the same priority.
onvoicefailure: optional attribute containing one value from the following enumerated list describing the desired behavior of the synthesis processor upon voice selection failure. The default value for this attribute is “priorityselect”.
- priorityselect - the synthesis processor uses the values of all voice feature attributes to select a voice by feature priority, where the starting candidate set is the set of all available voices.
- keepexisting - the voice does not change.
- processorchoice - the synthesis processor chooses the behavior (either priorityselect or keepexisting).

The following voice selection algorithm must be used:

All available voices are identified for which the values of all voice feature attributes listed in the required attribute value are matched. When the value of the required attribute is the empty string “”, any and all voices are considered successful matches. If one or more voices are identified, the selection is considered successful; otherwise there is voice selection failure.
If a successful selection identifies only one voice, the synthesis processor must use that voice.
If a successful selection identifies more than one voice, the remaining features (those not listed in the required attribute value) are used to choose a voice by feature priority, where the starting candidate set is the set of all voices identified.
If there is voice selection failure, a conforming synthesis processor must report the voice selection failure in addition to taking the action(s) expressed by the value of the onvoicefailure attribute.
To choose a voice by feature priority, each feature is taken in turn starting with the highest priority feature, as controlled by the ordering attribute.
- If at least one voice matches the value of the current voice feature attribute then all voices not matching that value are removed from the candidate set. If a single voice remains in the candidate set the synthesis processor must use it. If more than one voice remains in the candidate set then the next priority feature is examined for the candidate set.
- If no voices match the value of the current voice feature attribute then the next priority feature is examined for the candidate set.
After examining all feature attributes on the ordering list, if multiple voices remain in the candidate set, the synthesis processor must use any one of them.

Although each attribute individually is optional, it is an error if no attributes are specified when the voice element is used.

§Voice descriptions

For every voice made available to a synthesis processor, the vendor of the voice must document the following:

a list of language tags [BCP47, Tags for Identifying Languages] representing the languages the voice can read.
for each language, a language tag [BCP47, Tags for Identifying Languages] representing the accent the voice uses when reading the language.

Although indication of language (using xml:lang) and selection of voice (using voice) are independent, there is no requirement that a synthesis processor support every possible combination of values of the two. However, a synthesis processor must document expected rendering behavior for every possible combination. See the onlangfailure attribute for information on what happens when the processor encounters text content that the voice cannot speak.

voice attributes are inherited down the tree including to within elements that change the language. The defaults described for each attribute only apply at the top (document) level and are overridden by explicit author use of the voice element. In addition, changes in voice are scoped and apply only to the content of the element in which the change occurred. When processing reaches the end of a voice element content, i.e. the closing tag, the voice in effect before the beginning tag is restored.

Similarly, if a voice is changed by the processor as a result of a language speaking failure, the prior voice is restored when that voice is again able to speak the content. Note that there is always an active voice, since the synthesis processor is required to select a default voice before beginning execution of the document.

Relative changes in prosodic parameters should be carried across voice changes. However, different voices have different natural defaults for pitch, speaking rate, etc. because they represent different personalities, so absolute values of the prosodic parameters may vary across changes in the voice.

The quality of the output audio or voice may suffer if a change in voice is requested within a sentence.

Fields§

§gender: Option<Gender>

OPTIONAL attribute indicating the preferred gender of the voice to speak the contained text. Enumerated values are: “male”, “female”, “neutral”, or the empty string “”.

§age: Option<u8>

OPTIONAL attribute indicating the preferred age in years (since birth) of the voice to speak the contained text.

§variant: Option<NonZeroUsize>

OPTIONAL attribute indicating a preferred variant of the other voice characteristics to speak the contained text. (e.g. the second male child voice).

§name: Vec<String>

OPTIONAL attribute indicating a processor-specific voice name to speak the contained text. The value MAY be a space-separated list of names ordered from top preference down or the empty string “”. As a result a name MUST NOT contain any white space.

§languages: Vec<LanguageAccentPair>

OPTIONAL attribute indicating the list of languages the voice is desired to speak. The value MUST be either the empty string “” or a space-separated list of languages, with OPTIONAL accent indication per language. Each language/accent pair is of the form “language” or “language:accent”, where both language and accent MUST be an Extended Language Range, except that the values “und” and “zxx” are disallowed.

Struct VoiceAttributes Copy item path

§Voice descriptions

Fields§

Trait Implementations§

impl Clone for VoiceAttributes

fn clone(&self) -> VoiceAttributes

fn clone_from(&mut self, source: &Self)

impl Debug for VoiceAttributes

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Display for VoiceAttributes

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Hash for VoiceAttributes

fn hash<__H: Hasher>(&self, state: &mut __H)

fn hash_slice<H>(data: &[Self], state: &mut H)where H: Hasher, Self: Sized,

impl Ord for VoiceAttributes

fn cmp(&self, other: &VoiceAttributes) -> Ordering

fn max(self, other: Self) -> Selfwhere Self: Sized,

fn min(self, other: Self) -> Selfwhere Self: Sized,

fn clamp(self, min: Self, max: Self) -> Selfwhere Self: Sized,

impl PartialEq for VoiceAttributes

fn eq(&self, other: &VoiceAttributes) -> bool

fn ne(&self, other: &Rhs) -> bool

impl PartialOrd for VoiceAttributes

fn partial_cmp(&self, other: &VoiceAttributes) -> Option<Ordering>

fn lt(&self, other: &Rhs) -> bool

fn le(&self, other: &Rhs) -> bool

fn gt(&self, other: &Rhs) -> bool

fn ge(&self, other: &Rhs) -> bool

impl Eq for VoiceAttributes

impl StructuralPartialEq for VoiceAttributes

Auto Trait Implementations§

impl Freeze for VoiceAttributes

impl RefUnwindSafe for VoiceAttributes

impl Send for VoiceAttributes

impl Sync for VoiceAttributes

impl Unpin for VoiceAttributes

impl UnwindSafe for VoiceAttributes

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T> ToString for Twhere T: Display + ?Sized,

fn to_string(&self) -> String

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct VoiceAttributes

fn hash<H: Hasher>(&self, state: &mut H)

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

fn max(self, other: Self) -> Self
where Self: Sized,

fn min(self, other: Self) -> Self
where Self: Sized,

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T> ToString for T
where T: Display + ?Sized,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,