Developing a pilot Hindi Treebank based on Computational Paninian Grammar

Developing a pilot Hindi Treebank based on Computational Paninian Grammar

Files

01_titlepage.pdf (548.06 KB)

02_chapter1.pdf (138.93 KB)

03_chapter2.pdf (305.1 KB)

04_chapter3.pdf (304.16 KB)

05_chapter4.pdf (386.2 KB)

Abstract

Penn Treebank has proved the importance of treebanks as a linguistic resource for NLP. The current research presents an effort to develop a pilot treebank for Hindi, which could be used for creating a large scale treebank for Hindi. Building a treebank requires a computational grammar framework, an annotation scheme based on a chosen grammar, guidelines for annotating various types of constructions in the concerned language, and other related resources such as verb frames, etc. Since Hindi has a relatively free word order, dependency grammar formalism is well suited for it. So we chose Computational Paninian Grammar framework [36]. Panini s grammar is a dependency grammar [99, 162]. Hence, the scheme for annotating treebanks for Indian languages was developed based on this framework. As part of this study, a pilot treebank for Hindi (HyDT Hyderabad Dependency Treebank for Hindi) [21] was developed which was released for ICON-2009 (International Conference on Natural Language Processing-2009) [86]. The scheme [21] and guidelines for treebank annotation for Hindi developed during this study were modified and are being used for a multi-layered and multi-representational treebank for Hindi and Urdu [39, 42, 188] which is a collaborative project between various Universities. newline newlineAlong with the creation of Hindi Treebank (HyDT), I also created a supplementary resource of verb frames for 687 Hindi verbs. I present the work on verb frames [22] for Hindi verbs and show the methodology used in preparing these frames and the criteria followed for classifying Hindi verbs. The main goal of this work is to create a linguistic resource which will prove to be indispensable for various NLP applications. I have also worked on the mapping between Propbank annotation and dependency annotation, based on Paninian Grammatical Framework [21, 36]. newline newlineI have also discussed the use of HyDT data (Hyderabad Dependency Treebank for Hindi) [21] in various experiments.

URI

http://hdl.handle.net/10603/184865

Collections

Computational Linguistics

Full item page

Developing a pilot Hindi Treebank based on Computational Paninian Grammar

Files

Date

item.page.authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced