[an error occurred while processing this directive]

China Terminology ›› 2021, Vol. 23 ›› Issue (2): 20-26.doi: 10.3969/j.issn.1673-8578.2021.02.003

Previous Articles     Next Articles

Research on Automatic Extraction of Scientific Terminology from Texts Based on Self-Attention

ZHAO Songge1(), ZHANG Hao2(), CHANG Baobao1()   

  • Received:2020-12-16 Online:2021-04-25 Published:2021-04-07

Abstract:

Scientific terminology uses specific words to represent certain scientific concepts. The extraction of scientific terminology is an important part of the automatic processing of scientific terminology, and it is of great significance for the following tasks such as machine translation, information retrieval, and questions and answers. The traditional extraction of scientific terminology consumes a lot of manpower cost, and an automatic method for extracting scientific terminology is transforming terminology extraction into tagging problem and training out the tagging model through supervised learning methods, while the lack of annotated large-scale scientific terminology corpus is the problem. This paper introduces the method of distant supervision to generate large-scale annotated training corpus, and proposes Bi-LSTM model architecture based on Self-attention mechanism for the purpose of improving the extraction results of scientific terminology. We found that the ability of discovering new scientific terminology about our new model is far superior to the traditional machine learning model (CRF).

Key words: the extraction of scientific terminology, distant supervision, self-attention

CLC Number: