Health & Environmental Research Online (HERO)


Print Feedback Export to File
7687059 
Journal Article 
Tamil Part-of-Speech tagger based on SVMTool 
Dhanalakshmi, V; Anandkumar, M; Vijaya, MS; Loganathan, R; Soman, KP; Rajendran, S; , 
2008 
COLIPS PUBL 
SINGAPORE 
59-+ 
This paper presents the intricacies involved in developing a POS tagger generator using SVMTool for Tamil language. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. SVMTool is developed by Jesus Gim'enez and Lluis M'arquez for POS tagging. The tagset used here (Amrita Tagset) has been developed based on our experience in the creation of annotated corpus for Tamil. The present one consists of 32 tags. We have trained our corpus of two hundred and twenty five thousand words with SVMTool by tuning the parameters and feature patterns based on Tamil language. We have also used various tools like WEKA, MBT and TNT to compare our result. The output result is very encouraging. 
Lua, KT; Ji, DH; Dong, M; Smavatkul, D; 
978-981-08-1609-4 
International Conference on Asian Language Processing 
Chiang Mai, THAILAND