Design and Development of an Algorithm to Identify Apraxia of Speech Using Automatic Speech Recognition Methods for Real Time Applications
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Childhood Apraxia of Speech (CAS) is a common speech stuttering in
newlinechildren due to abnormal cell development. CAS symptoms affect motor skills and
newlinecognitive function. Early detection is crucial to prevent future prognosis and allow for
newlinecorrective training. Research on CAS traditionally uses MRI scans, patient gestures,
newlineand video recordings, but this approach has limitations. Early detection can help
newlineprevent dysfluency over time and improve treatment outcomes. Our work explores
newlinespeech signal analysis for CAS classification, a field rarely explored in existing
newlineliterature. We use Deep Neural Networks (DNNs) to tackle CAS classification,
newlinealigning with the advancements in speech recognition technology. This approach
newlineholds immense potential for improving early detection and improving the lives of
newlinechildren with CAS.
newlineThis research uses a deep learning framework with different architectures to
newlineautomatically recognize childhood speech apraxia. The architecture consists of four
newlinestages: computing features like Peaks and Fundamental frequency, detecting voice
newlineactivity using Short Term Energy and Zero Crossing Rate, extracting MFCC
newlinecoefficients, and classifying speech apraxia by providing STZCR and TEO features as
newlineinput. The Deep Neural Network Convolutional network performed better in
newlinecomparing machine learning algorithms. However, the research found that diagnosing
newlinespeech apraxia using a only convolutional neural network was challenging and not
newlinesuitable for medical applications in concern to accuracy. The researchers aimed to
newlineenhance accuracy by pursuing Objective 2.
newlineOur next exploration utilized DenseNet-121 and ResNet-18 for feature
newlinelearning, but these models struggled with overfitting and underfitting in speech
newlinedisorder classification. We pivoted to a more robust approach. Mel-frequency cepstral
newlinecoefficients (MFCCs) were extracted after pre-processing the audio for a consistent
newlineformat. Additionally, spectral features like centroid, pitch, and roll-off were obtained
newlinefrom the spectrogram. This combined feature set captured