Analysis of Emotional Speech using Excitation Source Information
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Speech communication carries message at two levels, the explicit message and the implicit message. The explicit message corresponds to the linguistic information. The implicit message consists of information regarding the speaker s signature, underlying emotion of the speaker, gender, etc.
newline
newlineThe main challenge in emotional speech analysis is to extract emotion-specific features, i.e., independent of speaker and sound units. In this thesis, relative contribution of different components of speech production for perception of emotion is studied. It is observed that the components related to the excitation source and prosody carry emotion characteristics predominantly.
newline
newlineExcitation source-related parameters such as the abruptness of glottal closure, strength of excitation, and energy of excitation extracted around the GCIs are examined for emotion classification. These excitation source-related parameters are also observed to be speaker-dependent, and hence they are expressed relative to the characteristics of the neutral speech for emotion classification.
newline
newlineFor extraction of features which are independent of speaker and sound units, a hierarchical approach is considered. Voice qualities of emotional speech such as arousal and rhythm are different for different emotions. In this context, the following studies are identified: Discrimination between modal speech and falsetto speech, identification of high arousal speech segments in modal speech, and discrimination between anger speech and happiness emotions in high arousal case. The excitation source information related to the abruptness of glottal closure at the subsegmental (1-3 ms) level discriminates modal and falsetto speech. It is observed that the excitation source information of the entire glottal cycle is useful for identification of high arousal speech segments. For discrimination between anger and happiness emotions, the excitation source information at the suprasegmental (gt100 ms) level appears to be useful
newline