Jump to main content
US EPA
United States Environmental Protection Agency
Search
Search
Main menu
Environmental Topics
Laws & Regulations
About EPA
Health & Environmental Research Online (HERO)
Contact Us
Print
Feedback
Export to File
Search:
This record has one attached file:
Add More Files
Attach File(s):
Display Name for File*:
Save
Citation
Tags
HERO ID
7687059
Reference Type
Journal Article
Title
Tamil Part-of-Speech tagger based on SVMTool
Author(s)
Dhanalakshmi, V; Anandkumar, M; Vijaya, MS; Loganathan, R; Soman, KP; Rajendran, S; ,
Year
2008
Publisher
COLIPS PUBL
Location
SINGAPORE
Page Numbers
59-+
Web of Science Id
WOS:000262873600010
Abstract
This paper presents the intricacies involved in developing a POS tagger generator using SVMTool for Tamil language. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. SVMTool is developed by Jesus Gim'enez and Lluis M'arquez for POS tagging. The tagset used here (Amrita Tagset) has been developed based on our experience in the creation of annotated corpus for Tamil. The present one consists of 32 tags. We have trained our corpus of two hundred and twenty five thousand words with SVMTool by tuning the parameters and feature patterns based on Tamil language. We have also used various tools like WEKA, MBT and TNT to compare our result. The output result is very encouraging.
Editor(s)
Lua, KT; Ji, DH; Dong, M; Smavatkul, D;
ISBN
978-981-08-1609-4
Conference Name
International Conference on Asian Language Processing
Conference Location
Chiang Mai, THAILAND
Home
Learn about HERO
Using HERO
Search HERO
Projects in HERO
Risk Assessment
Transparency & Integrity