Document Details | A Statistical-Machine-Translation Approach to Word Boundary Identification

Details for A Statistical-Machine-Translation Approach to Word Boundary Identification
Property	Value
Name	A Statistical-Machine-Translation Approach to Word Boundary Identification
Description	Word segmentation is a fundamental and essential step for processing a number of Asian languages such as Chinese, Japanese, and Thai. This paper presents a framework of Statistical Phrase-based Machine Translation for Thai word segmentation. The segmentation task can be recognized as a translation process from an unsegmented sentence to a segmented sentence. We formulate the problem by mapping individual characters (unsegmented text) to groups of characters (segmented text). The language and translation models which are constructed from the training data are applied in order to search for the best segmentation result. We also provide a simple post-processing system to correct segmentation errors of unknown words. The evaluation result shows the promising accuracy of average F-measure of 92.39%.
Filename	InterBEST_2.pdf
Filesize	412.81 kB
Filetype	pdf (Mime Type: application/pdf)
Creator	admin
Created On:	12/02/2009 21:27
Hits	134 Hits
Last updated on	12/02/2009 21:29
MD5 Checksum

Download Center