BEST2010
InterBEST 2009
BEST2009
Who's Online
เรามี 1 บุคคลทั่วไป ออนไลน์Login
Property | Value |
Name | A Statistical-Machine-Translation Approach to Word Boundary Identification |
Description | Word segmentation is a fundamental and essential step for processing a number of Asian languages such as Chinese, Japanese, and Thai. This paper presents a framework of Statistical Phrase-based Machine Translation for Thai word segmentation. The segmentation task can be recognized as a translation process from an unsegmented sentence to a segmented sentence. We formulate the problem by mapping individual characters (unsegmented text) to groups of characters (segmented text). The language and translation models which are constructed from the training data are applied in order to search for the best segmentation result. We also provide a simple post-processing system to correct segmentation errors of unknown words. The evaluation result shows the promising accuracy of average F-measure of 92.39%. |
Filename | InterBEST_2.pdf |
Filesize | 412.81 kB |
Filetype | pdf (Mime Type: application/pdf) |
Creator | admin |
Created On: | 12/02/2009 21:27 |
Hits | 134 Hits |
Last updated on | 12/02/2009 21:29 |
MD5 Checksum |