中国科技术语 ›› 2024, Vol. 26 ›› Issue (1): 11-18.doi: 10.12339/j.issn.1673-8578.2024.01.002

• • 上一篇    下一篇

基于语料库的对外汉语教学领域术语提取

卢一鑫()   

  1. 河南财经政法大学外语学院,河南郑州 450046
  • 收稿日期:2023-07-09 修回日期:2023-08-25 出版日期:2024-01-05 发布日期:2023-11-16
  • 作者简介:

    卢一鑫(1989—),女,博士,河南财经政法大学讲师。主要研究领域为应用语言学,汉俄对比语言学。先后参与“外汉多语言词典数据库建设”“中国传统哲学在俄罗斯的译介与传播历史研究”等科研项目。在国内外学术会议及期刊发表论文近10篇。通信方式:

  • 基金资助:
    “中国外语教育基金”项目“基于语料库的汉俄对外语言教学术语词典编纂方式探究”(ZGWYJYJJ11A102)

Corpus-Based Term Extraction in Field of Chinese Teaching as a Foreign Language

LU Yixin()   

  • Received:2023-07-09 Revised:2023-08-25 Online:2024-01-05 Published:2023-11-16

摘要:

文章介绍了自动提取对外汉语教学领域术语的方法。以对外汉语教学领域文本为目标文本,遵循主题取向、语料科学性、样本代表性、规模有限性等原则,建立专用语料库,并对其进行分词标注等加工;将统计学和语言学规则相结合,引用C-value方法计算术语度值,探索该领域内不同长度术语的发现、辨识及提取的“混合方法”(hybrid solution),最终建立对外汉语教学术语集,其中包含单词型术语238个,两词术语375个,三词术语121个和50个由4~6个单词组成的长术语。

关键词: 专用语料库, 术语提取, 对外汉语教学, 对外汉语教学术语集, C-value算法

Abstract:

This paper introduces a method to extract terms of Chinese teaching as a foreign language. We take the text in the field of Chinese teaching as a foreign language as the target text, follow the principles of subject orientation, scientific corpus, and limited sample representation to establish a specialized corpus, and process it such as word segmentation and POS tagging. We combine statistical and linguistic rules, use the C-value method to calculate the term degree value, and explore the “hybrid solution” to find, define and extract terms of different lengths in this field. Finally a terminology base for Chinese teaching as a foreign language is established, including 238 single word terms, 375 two word terms, 121 three word terms, and 50 long terms (consisting of 4-6 words).

Key words: specialized corpus, term extraction, Chinese teaching as foreign language, terminology base for Chinese teaching, C-value algorithm