Deep Learning Methods for Automatic Identification of Environmental Sounds and Acoustic Scenes

Deep Learning Methods for Automatic Identification of Environmental Sounds and Acoustic Scenes

dc.contributor.guide	Suresh K
dc.coverage.spatial
dc.creator.researcher	Aswathy Madhu
dc.date.accessioned	2023-12-05T12:30:11Z
dc.date.available	2023-12-05T12:30:11Z
dc.date.awarded	2023
dc.date.completed	2023
dc.date.registered	2017
dc.description.abstract	Computer audition has garnered attention from the audio and acoustic signal processing community during the past decade. This growing interest is due to its attractive audio surveillance and healthcare applications. Two fundamental problems in computer audition are automatic Environmental Sound Classification (ESC) and Acoustic Scene Classification (ASC). Despite promising application prospects, they are overshadowed by popular research areas like Automatic Speech Recognition and Music Information Recognition. It is due to the challenges posed by the environmental sounds and acoustic scenes in terms of their complex nature, lack of high-level structures usually observed in speech/music, and the large degree of intra-class and inter-class variabilities. Recently, deep learning approaches have been gaining popularity for both ESC and ASC. But the robustness of deep learning approaches depends mainly on the amount of available data and the audio signal representation. Moreover, the ASC designers have shifted their focus from improving accuracy to incorporating real-world considerations. newlineTherefore, the research work embodied in this thesis aims to develop robust deep learning frameworks to identify environmental sounds and acoustic scenes. Firstly, the influence of data augmentation in the context of ESC using a deep Convolutional Neural Network (CNN) is studied. Then, the possibility of using Generative Adversarial Networks (GAN) for data augmentation is investigated, and audio data augmentation is implemented using an existing GAN. Next, a GAN framework (EnvGAN) is implemented to generate sounds similar to the ones in three benchmark datasets. In addition, a quantitative similarity metric based on Siamese Neural Network is presented to evaluate the perceptual similarity of synthetic samples generated by EnvGAN. Next, two efficient signal representation techniques that can address the variabilities present in the acoustic scenes are proposed to obtain a robust ASC framework.
dc.description.note
dc.format.accompanyingmaterial	DVD
dc.format.dimensions
dc.format.extent
dc.identifier.uri	http://hdl.handle.net/10603/528343
dc.language	English
dc.publisher.institution	College of Engineering Trivandrum
dc.publisher.place	Thiruvananthapuram
dc.publisher.university	APJ Abdul Kalam Technological University, Thiruvananthapuram
dc.relation
dc.rights	university
dc.source.university	University
dc.subject.keyword	Engineering
dc.subject.keyword	Engineering and Technology
dc.subject.keyword	Engineering Electrical and Electronic
dc.title	Deep Learning Methods for Automatic Identification of Environmental Sounds and Acoustic Scenes
dc.title.alternative
dc.type.degree	Ph.D.