数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (2): 50-62.

doi: 10.11871/jfdc.issn.2096-742X.2022.02.005

• 专刊:先进智能计算平台及应用 • 上一篇    下一篇

基于ElasticSearch和语义相似度匹配的教学资源搜索策略

陶磊(),苏晨阳(),李正丹*(),朱静雯(),张玉志()   

  1. 南开大学,软件学院,天津 300450
  • 收稿日期:2022-02-06 出版日期:2022-04-20 发布日期:2022-04-30
  • 通讯作者: 李正丹
  • 作者简介:陶磊,南开大学,硕士研究生,研究方向为计算机视觉、自然语言处理、软件工程等。在本文中负责语义相似度模型相关部分的撰写。
    TAO Lei is a graduate student of Nankai University. His research interests include computer vision, natural language processing, software Enginee-ring, etc.
    In this article, he is responsible for writing the relevant parts of the semantic similarity model.
    E-mail: stonebegin@sina.com|苏晨阳,南开大学,硕士研究生,研究方向为自然语言处理、软件工程等。
    本文中负责ElasticSearch相关部分的撰写。
    SU Chenyang is a graduate student of Nankai University. His research interests include natural language processing and software engineering.
    In this article, he is responsible for writing the relevant parts of elasticsearch.
    E-mail: 15731471310@163.com|李正丹,南开大学,助理实验师,硕士,研究方向为人工智能、软件工程等。南开大学教学资源平台应用开发负责人。本文主要承担论文修改与审核相关工作。
    LI Zhengdan is an assistant experimentalist of Nankai Uni-versity. Her research interests include artificial intelligence, software engineering, etc. She is the development leader of Education resource platform of Nankai University.
    In this article, she is mainly responsible for the revision and review of the paper.
    E-mail: lzd@nankai.edu.cn|朱静雯,南开大学,实验师,硕士,研究方向为软件工程、软件安全、人工智能等。
    本文主要承担论文修改与审核相关工作。
    ZHU Jingwen is an experimentalist of Nankai University. Her research interests include software engin-eering, software security, artificial intelligence, etc.
    In this article, she is mainly responsible for the revision and review of the paper.
    E-mail: zhujingwen@nankai.edu.cn|张玉志,南开大学讲席教授,软件学院院长,主要研究方向为人工智能、模式识别、自然语言处理等。南开大学教学资源平台项目负责人。
    本文主要承担论文修改与审核相关工作。
    ZHANG Yuzhi is the chair professor and the Dean of the Software College at Nankai University. His research interests include artificial intelligence, pattern recognition, natural lan-guage processing, etc. He is the project leader of Education reso-urce platform of Nankai University.
    In this article, he is mainly responsible for the revision and rev-iew of the paper.
    E-mail: zyz@nankai.edu.cn
  • 基金资助:
    国家重点研发计划(2018YFB0204304)

Educational Resource Search Strategy Based on ElasticSearch and Semantic Similarity Matching

TAO Lei(),SU Chenyang(),LI Zhengdan*(),ZHU Jingwen(),ZHANG Yuzhi()   

  1. Department of Software, Nankai University, Tianjin 300450, China
  • Received:2022-02-06 Online:2022-04-20 Published:2022-04-30
  • Contact: LI Zhengdan

摘要:

【目的】整合多种教学资源,并在此场景下设计和实现一种高效准确的搜索策略,帮助用户获取丰富的教学内容。【应用背景】教学资源类型众多,数量庞大,用户对于准确检索的需求日益增长,仅基于ElasticSearch进行搜索的效果不尽人意。【方法】在对用户输入的Query进行预处理和分词后,通过ER-BERT语义相似度模型在Query库中匹配出n条近似结果,将其输入到ElasticSearch并构建相关度计算公式,最后按照综合评估的最终得分将匹配结果进行排序。【结果】利用知识图谱技术整合复杂的教学资源,并在此基础上实现了一种基于ElasticSearch和语义相似度匹配的教学资源搜索策略,在保证检索速度的同时可以根据用户检索Query的语义信息进行检索。【结论】实验结果表明使用该教学资源搜索策略增加了检索结果的数量,并在保证检索速度的同时提升了结果的准确性,显著改善了用户的搜索体验。

关键词: ElasticSearch, 文本相似度, 搜索策略, 知识图谱

Abstract:

[Objective] In order to integrate a variety of educational resources and help users obtain rich education contents, this paper presents the design and implementation of an efficient and accurate search strategy under this scenario. [Context] There are many types of educational resources and their quantity is huge. Users' demand for accurate retrieval is growing day by day. The effect of the current search approach based on ElasticSearch is not satisfactory. [Methods] After preprocessing and word segmentation of the query input by the user, n approximate results are matched in the query database through the ER-BERT semantic similarity model and inputted into ElasticSearch, and then the correlation calculation formula are constructed. Finally, the matching results are sorted according to the final score of the comprehensive evaluation. [Results] Using knowledge graph technology to integrate complex education resources, an education resource search strategy based on ElasticSearch and semantic similarity matching is realized. While ensuring the retrieval speed, it can be used to search according to the semantic information of the query retrieved by users. [Conclusions] The experiment results show that using this education resource search strategy increases the number of search results, improves the accuracy of results while ensuring the search speed, and significantly improves the user's search experience.

Key words: ElasticSearch, text similarity, search strategy, knowledge graph