Part of speech of Nyishi language
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Part-of-speech (POS) tagging is a fundamental task in natural language processing that supports applications such as machine translation, syntactic analysis, and information retrieval. Developing POS taggers for low-resourced languages remains challenging due to the lack of annotated corpora and digital tools. Nyishi, a Tibeto-Burman language spoken in Arunachal Pradesh, India, is one such underrepresented language. This research aims to address this gap by developing and evaluating POS tagging models specifically for Nyishi, contributing to its computational processing, documentation, and digital preservation.
newlineThe study focuses on the creation of essential linguistic resources, including a manually annotated Nyishi POS corpus based on a 14-tag grammatical tagset and an English Nyishi bilingual dictionary containing over 45,000 entries. Using these resources, three POS tagging approaches were implemented and compared: Hidden Markov Models (HMM), Conditional Random Fields (CRF), and a deep learning-based BiLSTM-CRF model. The models were trained and evaluated to assess their effectiveness in capturing the linguistic patterns of Nyishi.
newlineExperimental results show that while the HMM achieved high recall, it suffered from low precision. The CRF model improved tagging consistency, but the BiLSTM-CRF model performed best overall, achieving an accuracy of 93% and superior precision and recall. The findings demonstrate the effectiveness of deep learning models for low-resourced languages and provide a practical framework for future NLP research on Nyishi and similar languages.
newline
newline