一种基于统计技术的中文术语抽取方法

doi:10.3969/j.issn.1673-8578.2014.05.002

中国科技术语 ›› 2014, Vol. 16 ›› Issue (5): 10-14.doi: 10.3969/j.issn.1673-8578.2014.05.002

一种基于统计技术的中文术语抽取方法

刘剑^1,2, 唐慧丰¹, 刘伍颖¹

1.解放军外国语学院,河南洛阳 471003;
2.中国科学院计算技术研究所,北京 100190

收稿日期:2014-03-11 出版日期:2014-10-25 发布日期:2020-07-01
作者简介:刘剑(1979—),男,汉族,解放军外国语学院讲师,中国科学院计算技术研究所博士生,主要研究数据挖掘和知识工程等。通信方式:liujian_public@sina.com。

An Extraction Method for Chinese Terminology Based on Statistical Technology

LIU Jian^1,2, TANG Huifeng¹, LIU Wuying¹

Received:2014-03-11 Online:2014-10-25 Published:2020-07-01

摘要/Abstract

摘要： 中文术语识别与抽取是中文文本信息处理的基础,对于提高中文文本索引与检索、文本挖掘、本体构建、潜在语义分析等的处理精度有着重要的意义。文章以互信息和信息熵理论为基础,提出一种基于统计技术的中文术语半自动抽取方法,并且以互联网新闻话题数据为对象进行了实验验证,结果表明所提方法能够有效支持中文术语的抽取。

关键词: 互信息, 信息熵, 中文术语抽取

Abstract: Chinese terminology extraction is a fundamental issue in Chinese text information processing. It has been applied in many other fields, such as Chinese text indexing and retrieval, text mining, ontology construction, and latent semantic analysis. Based on mutual information and information entropy theory, we proposed a semi-automatic Chinese terminology extraction method by statistical technology, and experimentally verified this method using internet news topic data. Our results show that proposed method can effectively support Chinese term extraction.

Key words: mutual information, information entropy, Chinese terminology extraction

中图分类号:

刘剑, 唐慧丰, 刘伍颖. 一种基于统计技术的中文术语抽取方法[J]. 中国科技术语, 2014, 16(5): 10-14.

LIU Jian, TANG Huifeng, LIU Wuying. An Extraction Method for Chinese Terminology Based on Statistical Technology[J]. China Terminology, 2014, 16(5): 10-14.

一种基于统计技术的中文术语抽取方法

An Extraction Method for Chinese Terminology Based on Statistical Technology

PDF (PC)

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价