中国科技术语 ›› 2018, Vol. 20 ›› Issue (2): 12-17.doi: 10.3969/j.issn.1673-8578.2018.02.002

• 术语学研究 • 上一篇    下一篇

基于英汉平行语料库的术语组块自动抽取

杨福义   

  1. 鞍山师范学院,辽宁鞍山 114006
  • 收稿日期:2017-11-01 出版日期:2018-04-25 发布日期:2018-04-25
  • 作者简介:杨福义(1945—),男, 鞍山师范学院高级工程师, 目前研究语料库、术语数据库与知识组织系统。通信方式: yangfuyi@sina.com。

Automatic Extraction of Term Chunks Based on Parallel Corpora of English and Chinese

YANG Fuyi   

  • Received:2017-11-01 Online:2018-04-25 Published:2018-04-25

摘要:

双语平行语料库的数据资源建设是语言工程的前端。其中包含大量的术语及语言翻译知识。深入研究和开发双语语料库,对术语翻译具有重要意义。文章论述了平行语料库的深加工流程和中文语料标注的自动化加工。使用“语法符号语言”建立文本的语法映像,生成短语组块库。按短语结构规则采用人工智能方法自动抽取术语翻译组块,自动生成术语组块词典与词表,列出部分术语组块查询应用的实例和逆向追踪双语例句的实例。

关键词: 计算术语学, 语料库, 知识抽取, 术语部件, 组块

Abstract:

The construction of data resources of bilingual parallel corpora is the front end of language engineering, and contains a large number of terms and language translation knowledge. Full use of bilingual corpora for further research and development is of great significance to terminology translation. This article discusses the deep processing flow of parallel corpora and automatic processing of Chinese corpus annotation. Using the grammar symbol language, the grammar image of the text is set up, and the phrase chunk library is generated. According to the rules of phrase structure, the term translation chunk is automatically extracted by the method of artificial intelligence, and the lexicon and thesaurus of term chunks are automatically generated. Moreover, some examples of the application of terminology block query and examples of reverse tracing bilingual examples are listed.

Key words: computational terminology, corpus, knowledge extraction, component, term block

中图分类号: