中国科技术语 ›› 2025, Vol. 27 ›› Issue (2): 132-136.doi: 10.12339/j.issn.1673-8578.2025.02.019

• 数据技术 • 上一篇    下一篇

基于预训练语言模型的军事术语自动抽取与分析

向音()   

  1. 武汉东湖学院,湖北武汉 430212
  • 收稿日期:2024-10-30 修回日期:2024-11-26 出版日期:2025-03-05 发布日期:2025-03-06
  • 作者简介:

    向音(1971—),女,武汉东湖学院教授。研究方向为术语学、社会语言学、自然语言处理等。通信方式:

  • 基金资助:
    全国科学技术名词审定委员会一般项目“基于语料库的军事术语自动抽取与分析”(YB2020002)

Automatic Term Extraction and Analysis of Military Terms Based on Pre-trained Language Model

XIANG Yin()   

  • Received:2024-10-30 Revised:2024-11-26 Online:2025-03-05 Published:2025-03-06

摘要:

本文以军事术语为研究对象,结合军事语料,使用近年来在自然语言处理中取得突破性进展的预训练语言模型实现军事术语的自动抽取。研究基础工作包括大型军事语料库的构建与抽取,模型的设计采用ChatGLM作为基础模型,通过“预训练”“微调”两个阶段实现新军事术语抽取,预训练阶段是模型在大规模无标注的军事语料进行预训练,以获取语料中的军事术语信息。

关键词: 军事术语, 预训练语言模型, 术语抽取

Abstract:

In this paper we take military terminology as the research object and, in conjunction with military corpora, employ a pre-trained language model that has achieved groundbreaking progress in natural language processing in recent years to realize the automatic extraction of military terms. The foundational work of the research includes the construction of a large-scale military corpus and the design of the extraction model. Using ChatGLM as the base model, the extraction of new military terms is realized through the “pre-training” and “fine-tuning” phases. The pre-training phase involves pre-training the model on a large-scale unlabeled military corpus to acquire military terminology information from the corpus.

Key words: military terminology, pre-trained language model, term extraction