Performance analysis of different deep learning architectures for Hand action recognition

Abstract

Recognizing the hand actions in an unrestrained context is a challenging computer vision task. Computational cost, rapid movement, illumination changes, self-occlusion, uncertain environment, varying viewpoint, varying hand shape, size, and high degrees of freedom (DOF) are the factors that impact the performance of the hand action recognition system. To address the above specified challenges in the area of hand action recognition two different deep Convolution Neural Network (CNN) based approaches namely, multi-stage CNN and single-stage CNN are proposed and reported in this thesis. The existing standard hand action datasets do not consider most of the complexities or challenges as quoted earlier. Hence, a hand action dataset that can be used for real-time hand action recognition is collected and named MITI-HD . All the below mentioned contributions are evaluated using two standard datasets (NUSHP-II and Senz-3D) and a custom developed dataset (MITI-HD). Each model is trained using different Stochastic Gradient Descent Optimizers (Adam, Momentum, and RMSprop). The Faster R-CNN Inception-V2 is a multi-stage CNN approach utilized to perform a real-time hand action recognition. Inception-V2 is used as a backbone feature extraction network. The proposed model using Adam optimizer produces better performance (Average Precision (AP) = 99.10%, Average Recall (AR) = 96.78%, F1-Score = 97.98%, and Prediction time = 140 ms) than the other optimizers on the MITI-HD dataset. The single-stage CNN based six different deep learning models are evaluated in relation to real-time hand action recognition. newline

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced