中国科技术语 ›› 2024, Vol. 26 ›› Issue (3): 84-92.doi: 10.12339/j.issn.1673-8578.2024.03.010

• 实践应用 • 上一篇    下一篇

语料库驱动的通用汉语学术词表构建

高松1(), 钱隆2(), 丁芊3()   

  1. 1 大连外国语大学汉学院,辽宁大连 116044
    2 北京语言大学国际中文教育研究院,北京 100083
    3 安徽信息工程学院通识教育与外国语学院,安徽芜湖 241199
  • 收稿日期:2023-10-18 修回日期:2024-04-28 出版日期:2024-07-05 发布日期:2024-07-05
  • 作者简介:

    高松(1982—),女,博士,大连外国语大学汉学院副教授,硕士生导师。研究方向为计算语言学、对外汉语教学。主持国家社科基金、教育部人文社科基金项目等5项。出版学术专著1部,发表论文20余篇,获得省级奖项3项。通信方式:

    钱隆(1996—),男,北京语言大学国际中文教育研究院在读博士研究生。研究方向为语料库语言学、国际中文教育。通信方式:

    丁芊(1997—),女,安徽信息工程学院通识教育与外国语学院助教。研究方向为话语分析、功能语言学。通信方式:

  • 基金资助:
    2020年教育部人文社会科学研究青年基金项目“基于语料库的现代汉语书面语历时演变计量研究”(20YJC740010); 2023年北京语言大学研究生创新基金(中央高校基本科研业务费专项资金)项目“基于依存树库的英美汉语学习者二语书面语发展特征计量研究”(23YCX162); 2022年教育部产学合作协同育人项目“大学英语智慧教学实践互动平台建设”(220600273075052)

Developing and Validating a Chinese Academic Vocabulary List (CAVL): A Corpus-Driven Approach

GAO Song1(), QIAN Long2(), DING Qian3()   

  • Received:2023-10-18 Revised:2024-04-28 Online:2024-07-05 Published:2024-07-05

摘要:

学术词表开发是学术汉语研究的重要课题。该研究基于1450篇学术期刊论文构建了汉语学术语料库,借鉴英语学术词表AWL和AVL的创建方法,开发了一个包含1368个词型的通用汉语学术词表,并对该词表的有效性进行了检验。检验结果表明:通用汉语学术词表覆盖了汉语学术语料库25.88%的文本,覆盖水平较高;词表在BCC和LCMC语料库的学术、科技子库的覆盖率分别为18.85%和23.86%,在文学库和微博库的覆盖率低于3%,这一差异表明词表中的学术词汇具有一定代表性;词表对汉语学术语料库各子库的覆盖率均超过17%,能较好地服务于不同学科领域的学术汉语教学和学习。词表展示了汉语学术词汇学习目标,为学术词汇教学、学习以及专门用途汉语教材编写提供了参考。

关键词: 词表, 学术词汇, 语料库, 学术汉语

Abstract:

The development of Chinese academic vocabulary lists is a significant topic in the study of Academic Chinese. This research constructed a Chinese academic corpus (CAC) based on 1,450 academic journal articles. Drawing from the methodologies employed in the creation of the Academic Word List (AWL) and the Academic Vocabulary List (AVL) in English, we developed CAVL with 1,368 word-types. An assessment of CAVL’s efficacy indicated that it covered 25.88% of the CAC, demonstrating a high level of coverage. The list accounted for 18.85% and 23.86% of academic and technological sub-corpora in the BCC and LCMC respectively. However, its coverage in literature and microblog sub-corpora was less than 3%, and this difference suggests that the academic vocabulary in the lexicon is sornewhat representative. CAVL covers more than 17% of all sub-corpora in CAC, showcasing a balanced representation across various academic disciplines, hence can serve as a valuable resource for Academic Chinese teaching and learning in different fields. CAVL delineates the learning objectives for Chinese academic vocabulary and offers a reference point for vocabulary instruction, learning, and the development of Chinese textbooks for specific purposes.

Key words: vocabulary list, Chinese academic vocabulary, Corpus, Chinese for academic purpose