InterBEST 2009

Who's Online

เรามี 1 บุคคลทั่วไป ออนไลน์


Details for Thai Word Segmentation using Character-level Information
NameThai Word Segmentation using Character-level Information

This paper describes a Thai word segmentation approach where Conditional Random Fields (CRFs) are utilized for classifying each character associated with the text string to be segmented into classes of characters categorized based on their positions in the underlying words. Characters used in the Thai writing system are attached with character functions proposed in this work. N-grams of these character functions are considered together with character N-grams within the feature templates of the CRF models in order for the models to locate characters likely to indicate word boundaries. The proposed methods yields the best F-measure score of 95.53% which is better than ones obtained based on word trigrams. It is also shown that character-level constraints make the result more robust to segmenting unseen words.


Filesize222.09 kB
Filetypepdf (Mime Type: application/pdf)
Created On: 12/02/2009 21:24
Hits159 Hits
Last updated on 12/02/2009 21:26
MD5 Checksum