Rule based dependency parser for Telugu
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Abstract
newlineParsing natural languages has been gaining popularity in recent years and
newlineattracted the interest of Natural Language Processing (NLP) researchers around
newlinethe world. It is challenging when the language under study is a free-word order
newlinelanguage and morphologically rich like Telugu, the south-central Dravidian
newlinelanguage. Parsing refers to the process of syntactic analysis of a specific language
newlinetext. A parser is an automated tool that dissects sentences to provide
newlinesyntactic/syntactico-semantic analysis of relations of words in a sentence. Parsing
newlineis useful in the downstream analysis and applications of NLP such as machine
newlinetranslation, document classification, dialogue modelling, etc..,
newlineThis study adopts a knowledge-driven approach, i.e. a rule-based technique for
newlinebuilding parser for Telugu using linguistic cues as rules. The present research
newlineadopts the Indian grammatical tradition i.e. P¯an. ini s Grammatical (PG) tradition
newlineas the dependency model to parse sentences. A detailed description of mapping
newlinesemantic relations to vibhaktis (case suffixes and postpositions) using linguistic
newlinecues in Telugu is presented.
newlineAn enhanced annotation scheme for Telugu dependency relations is introduced.
newlineChallenges faced in parsing ambiguous structures are elaborated alongside
newlineproviding enhanced tags to handle them. The study describes the parsing
newlinealgorithm and the linguistic knowledge employed while developing the parser. The
newlineresearch further provides results, which suggest that enriching the current parser
newlinewith linguistic inputs can increase the accuracy and tackle ambiguity better than
newlineexisting data-driven methods. Results are encouraging and this parser proves to be
newlineefficient for languages like Telugu which can be later extended to other
newlinemorphologically-rich languages.
newline