InterBEST 2009

Who's Online

เรามี 1 บุคคลทั่วไป ออนไลน์


Details for A Statistical-Machine-Translation Approach to Word Boundary Identification
NameA Statistical-Machine-Translation Approach to Word Boundary Identification

Word segmentation is a fundamental and essential step for processing a number of Asian languages such as Chinese, Japanese, and Thai. This paper presents a framework of Statistical Phrase-based Machine Translation for Thai word segmentation. The segmentation task can be recognized as a translation process from an unsegmented sentence to a segmented sentence. We formulate the problem by mapping individual characters (unsegmented text) to groups of characters (segmented text). The language and translation models which are constructed from the training data are applied in order to search for the best segmentation result. We also provide a simple post-processing system to correct segmentation errors of unknown words. The evaluation result shows the promising accuracy of average F-measure of 92.39%.

Filesize412.81 kB
Filetypepdf (Mime Type: application/pdf)
Created On: 12/02/2009 21:27
Hits134 Hits
Last updated on 12/02/2009 21:29
MD5 Checksum