Intelligibility Assessment of Dysarthric Speech
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Abstract
newlineDysarthria is a neurogenic speech disorder and speakers with dysarthria
newlineproduce less intelligible speech. Assessing the intelligibility is very much
newlineessential to know about the progressiveness of the disease and to plan
newlinethe therapy by speech language pathologists. Since subjective tests are
newlineperceptual in nature, they are inconsistent, biased, time consuming and
newlinecostly. In contrast, automatic objective assessment methods are repeatable,
newlineless time consuming and relatively cheap. Hence, it is important to
newlineautomate the process.
newlineAutomation involves identification of features that can distinguish the
newlineintelligibility levels followed by classification using machine learning techniques.
newlineTypes of features depend on the signal representation. This
newlinework aims at investigating different signal representations, features and
newlinemachine learning techniques for automatic intelligibility assessment of
newlinedysarthric speech.
newlineFeatures distinguishing the intelligibility levels are identified and computed
newlinefrom corresponding signal representations. These features are extracted
newlineat frame level and utterance level which are known as local and
newlineglobal descriptors respectively. To obtain a fixed length feature vector,
newlinelocal descriptors are converted to global descriptors using temporal and
newlineFisher vector encoding techniques. Global descriptors are given to PLDA
newlineand ANN classifiers for intelligibility assessment.
newlineAlso, different time-frequency representations with fixed and variable time
newlineresolutions are derived. STFT and SFF are the time-frequency representations
newlinewith fixed time resolution whereas, CQT is the variable time
newlineresolution time-frequency representation. These derived time-frequency
newlinerepresentations are used with CNN classifier. In addition to this, region
newlineiii
newlinebased prediction is employed to improve the performance. In the assessment
newlinetask, two standard databases of American English language are
newlineused.
newlineThe speech data from UA (only words) and TORGO (both words and
newlinesentences) databases are used. The performance of the extracted fe