Dialect Classification and Multi Dialect Speech Recognition
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Keywords: dialect classification; zero-time windowing; single frequency filtering; frequency domain
newlinelinear prediction; convolution neural network; ECAPA-TDNN; deepspeech; multi-dialect automatic speech
newlinerecognition; Indian English ASR
newlineMajor goal of this thesis is to study the dialectal variations and improve the performance of speech
newlinerecognition with an embeddings derived from improved dialect classification system. Initial studies focused
newlineon improvement of dialect classification system with three major dialects (AU:Australian, UK:Britain, and
newlineUS:American) of English.
newlineIn order to improve the performance of dialect classification system and based on the analysis of dialectal
newlinevariations, advanced signal processing approaches were proposed to investigate for dialect classification
newlinewith traditional i-vector system. The features that provide high spectral resolution will help to capture
newlinesubtle differences between dialects. So, this thesis proposed to use single frequency filtering (SFF) and
newlinezero-time windowing (ZTW) based features that provide high spectral resolution without compromising
newlinetemporal resolution. Along with frame level spectral resolution, longer temporal context will constitute
newlinefor dialect classification. So, approaches that enhance the temporal context of proposed features (SFF and
newlineZTW) approaches such as delta and double delta coefficients (and#916;+and#916;and#916;), shifted delta coefficients (SDCs)
newlineare experimented. It is observed that dialect classification system has given promising performance with
newlinethe proposed features with temporal context provided by and#916;+and#916;and#916; and SDCs. Further, signal processing
newlineapproaches that can provide long temporal summarization such as frequency domain linear prediction
newline(FDLP) are proposed for dialect classification. From experiments, with FDLP based features, it is observed
newlinethat long temporal summarization provided by FDLP based features is advantageous for discriminating
newlinedialects. So, both the signal processing approaches that provide high spectral resolution (SFF and ZTW) and
newlinelong temporal sum