Deep Learning Methods for Automatic Identification of Environmental Sounds and Acoustic Scenes

dc.contributor.guideSuresh K
dc.coverage.spatial
dc.creator.researcherAswathy Madhu
dc.date.accessioned2023-12-05T12:30:11Z
dc.date.available2023-12-05T12:30:11Z
dc.date.awarded2023
dc.date.completed2023
dc.date.registered2017
dc.description.abstractComputer audition has garnered attention from the audio and acoustic signal processing community during the past decade. This growing interest is due to its attractive audio surveillance and healthcare applications. Two fundamental problems in computer audition are automatic Environmental Sound Classification (ESC) and Acoustic Scene Classification (ASC). Despite promising application prospects, they are overshadowed by popular research areas like Automatic Speech Recognition and Music Information Recognition. It is due to the challenges posed by the environmental sounds and acoustic scenes in terms of their complex nature, lack of high-level structures usually observed in speech/music, and the large degree of intra-class and inter-class variabilities. Recently, deep learning approaches have been gaining popularity for both ESC and ASC. But the robustness of deep learning approaches depends mainly on the amount of available data and the audio signal representation. Moreover, the ASC designers have shifted their focus from improving accuracy to incorporating real-world considerations. newlineTherefore, the research work embodied in this thesis aims to develop robust deep learning frameworks to identify environmental sounds and acoustic scenes. Firstly, the influence of data augmentation in the context of ESC using a deep Convolutional Neural Network (CNN) is studied. Then, the possibility of using Generative Adversarial Networks (GAN) for data augmentation is investigated, and audio data augmentation is implemented using an existing GAN. Next, a GAN framework (EnvGAN) is implemented to generate sounds similar to the ones in three benchmark datasets. In addition, a quantitative similarity metric based on Siamese Neural Network is presented to evaluate the perceptual similarity of synthetic samples generated by EnvGAN. Next, two efficient signal representation techniques that can address the variabilities present in the acoustic scenes are proposed to obtain a robust ASC framework.
dc.description.note
dc.format.accompanyingmaterialDVD
dc.format.dimensions
dc.format.extent
dc.identifier.urihttp://hdl.handle.net/10603/528343
dc.languageEnglish
dc.publisher.institutionCollege of Engineering Trivandrum
dc.publisher.placeThiruvananthapuram
dc.publisher.universityAPJ Abdul Kalam Technological University, Thiruvananthapuram
dc.relation
dc.rightsuniversity
dc.source.universityUniversity
dc.subject.keywordEngineering
dc.subject.keywordEngineering and Technology
dc.subject.keywordEngineering Electrical and Electronic
dc.titleDeep Learning Methods for Automatic Identification of Environmental Sounds and Acoustic Scenes
dc.title.alternative
dc.type.degreePh.D.

Files

Original bundle

Now showing 1 - 5 of 15
Loading...
Thumbnail Image
Name:
01_title.pdf
Size:
447.18 KB
Format:
Adobe Portable Document Format
Description:
Attached File
Loading...
Thumbnail Image
Name:
02_preliminary pages.pdf
Size:
469.62 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
03_contents.pdf
Size:
104.89 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
04_abstract.pdf
Size:
73.51 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
05_chapter 1.pdf
Size:
119.51 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Plain Text
Description: