Tamil Part-of-Speech tagger based on SVMTool | Health & Environmental Research Online (HERO)

Health & Environmental Research Online (HERO)

Print Feedback Export to File

This record has one attached file:

Citation
Tags

HERO ID

7687059

Reference Type

Journal Article

Title

Tamil Part-of-Speech tagger based on SVMTool

Author(s)

Dhanalakshmi, V; Anandkumar, M; Vijaya, MS; Loganathan, R; Soman, KP; Rajendran, S; ,

Year

2008

Publisher

COLIPS PUBL

Location

SINGAPORE

Page Numbers

59-+

Web of Science Id

WOS:000262873600010

Abstract

This paper presents the intricacies involved in developing a POS tagger generator using SVMTool for Tamil language. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. SVMTool is developed by Jesus Gim'enez and Lluis M'arquez for POS tagging. The tagset used here (Amrita Tagset) has been developed based on our experience in the creation of annotated corpus for Tamil. The present one consists of 32 tags. We have trained our corpus of two hundred and twenty five thousand words with SVMTool by tuning the parameters and feature patterns based on Tamil language. We have also used various tools like WEKA, MBT and TNT to compare our result. The output result is very encouraging.

Editor(s)

Lua, KT; Ji, DH; Dong, M; Smavatkul, D;

ISBN

978-981-08-1609-4

Conference Name

International Conference on Asian Language Processing

Conference Location

Chiang Mai, THAILAND