BEST2010
InterBEST 2009
BEST2009
Who's Online
เรามี 1 บุคคลทั่วไป ออนไลน์Login
Property | Value |
Name | Thai Word Segmentation using Character-level Information |
Description | This paper describes a Thai word segmentation approach where Conditional Random Fields (CRFs) are utilized for classifying each character associated with the text string to be segmented into classes of characters categorized based on their positions in the underlying words. Characters used in the Thai writing system are attached with character functions proposed in this work. N-grams of these character functions are considered together with character N-grams within the feature templates of the CRF models in order for the models to locate characters likely to indicate word boundaries. The proposed methods yields the best F-measure score of 95.53% which is better than ones obtained based on word trigrams. It is also shown that character-level constraints make the result more robust to segmenting unseen words.
|
Filename | InterBEST_4.pdf |
Filesize | 222.09 kB |
Filetype | pdf (Mime Type: application/pdf) |
Creator | admin |
Created On: | 12/02/2009 21:24 |
Hits | 159 Hits |
Last updated on | 12/02/2009 21:26 |
MD5 Checksum |