中国科技术语 ›› 2010, Vol. 12 ›› Issue (4): 19-23.doi: 10.3969/j.issn.1673-8578.2010.04.004

• 术语学研究 • 上一篇    下一篇

字母词的全/半角形式对中文分词的影响及对策初探

胡凤国   

  1. 中国传媒大学应用语言学研究所,北京 100024
  • 收稿日期:2010-05-20 出版日期:2010-08-25 发布日期:2010-08-25
  • 作者简介:胡凤国(1976—),男,山东曹县人,博士,主要研究方向为计算语言学。通信方式:bushiwoshishui@cuc.edu.cn。

Impact of the Full- and Half-width Form of Lettered Words upon Chinese Word Segmentation and Treatment Measures

HU Fengguo   

  • Received:2010-05-20 Online:2010-08-25 Published:2010-08-25

摘要: 中文科技名词自动抽取的关键步骤是分词,文章首先讨论中文语料库中字母词的全/半角现象,然后考察这种现象对自动分词结果当中字母词的一致性和准确性所产生的影响,并给出提高切分结果的一致性和准确性的对策,最后阐述中国传媒大学的分词系统在这方面所做的工作。

关键词: 字母词, 科技名词, 术语抽取, 分词, 全/半角

Abstract: Based on the phenomenon of the full- and half-width forms of lettered words in Chinese language corpus, we investigated the influence of this phenomenon on the consistency and accuracy of lettered words in word segmentation results, and suggested how to improve the consistency and accuracy of word segmentation. We also introduced our work on constructing the word segmentation system sponsored by Communication University of China.

Key words: lettered words, term extraction, word segmentation, full- and half-width

中图分类号: