Design and Development of an Algorithm to Identify Apraxia of Speech Using Automatic Speech Recognition Methods for Real Time Applications

Loading...
Thumbnail Image

Date

item.page.authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Childhood Apraxia of Speech (CAS) is a common speech stuttering in newlinechildren due to abnormal cell development. CAS symptoms affect motor skills and newlinecognitive function. Early detection is crucial to prevent future prognosis and allow for newlinecorrective training. Research on CAS traditionally uses MRI scans, patient gestures, newlineand video recordings, but this approach has limitations. Early detection can help newlineprevent dysfluency over time and improve treatment outcomes. Our work explores newlinespeech signal analysis for CAS classification, a field rarely explored in existing newlineliterature. We use Deep Neural Networks (DNNs) to tackle CAS classification, newlinealigning with the advancements in speech recognition technology. This approach newlineholds immense potential for improving early detection and improving the lives of newlinechildren with CAS. newlineThis research uses a deep learning framework with different architectures to newlineautomatically recognize childhood speech apraxia. The architecture consists of four newlinestages: computing features like Peaks and Fundamental frequency, detecting voice newlineactivity using Short Term Energy and Zero Crossing Rate, extracting MFCC newlinecoefficients, and classifying speech apraxia by providing STZCR and TEO features as newlineinput. The Deep Neural Network Convolutional network performed better in newlinecomparing machine learning algorithms. However, the research found that diagnosing newlinespeech apraxia using a only convolutional neural network was challenging and not newlinesuitable for medical applications in concern to accuracy. The researchers aimed to newlineenhance accuracy by pursuing Objective 2. newlineOur next exploration utilized DenseNet-121 and ResNet-18 for feature newlinelearning, but these models struggled with overfitting and underfitting in speech newlinedisorder classification. We pivoted to a more robust approach. Mel-frequency cepstral newlinecoefficients (MFCCs) were extracted after pre-processing the audio for a consistent newlineformat. Additionally, spectral features like centroid, pitch, and roll-off were obtained newlinefrom the spectrogram. This combined feature set captured

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced