Diverse Multilingual and Mixed lingual Emotion Recognition using Perception based Speech Analysis

Abstract

This thesis focuses on exploring, investigating and analysing perception based speech features for emotion recognition in diverse and mixed-language environments across discrete and dimensional emotion spaces. Majority of the existing approaches suggested in literature for multilingual speech emotion recognition (SER) studies have evolved around exploring new speech features and expanding the existing speech feature vectors for effective emotion recognition. Subtle emotions like disgust and boredom whose sample size are found to be less across majority of the databases are usually less recognized. Besides, the cross corpus SER systems are usually associated with various preprocessing techniques, large speech feature vectors and feature selection mechanisms. For these systems to be applicable in countries like India with population communicating in a mix of diverse languages, they must be further enhanced as existing cross corpus SER works have mostly dealt with 2 to 3 language samples each time during training-testing process. Also, most of the emotion recognition works have been targeted either for discrete or dimensional emotion spaces. The thesis aims to solve the newlinementioned shortcomings and limitations of the prevailing works. The main focus of SER system design in this work involves identifying vital compact set of features through speech analysis for efficient emotion recognition. From the exhaustive literature survey and initial SER studies performed by the author, it is found that human emotions are better perceived through cepstral feature analysis. In this thesis, the initial research work started in search of effective cepstral speech feature combination for a monolingual SER system . Through the experimentation performed, it was found that cepstral features derived from Mel and Bark scales were quiet significant for emotion discrimination across both emotion spaces. Artificial Neural Networks (ANN) and Deep Neural Networks (DNN)were chosen for classification. Next,the proposed monolingual SER system...

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced