数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (5): 13-22.

doi: 10.11871/jfdc.issn.2096-742X.2020.05.002

• 专刊:现代服务业关键技术与应用 • 上一篇    下一篇

基于领域本体的科技资源聚类方法研究

葛胤池1(),张辉1(),宋文燕2,*(),王轩1()   

  1. 1.北京航空航天大学计算机学院,北京 100191
    2.北京航空航天大学经济管理学院,北京 100191
  • 收稿日期:2020-07-17 出版日期:2020-10-20 发布日期:2020-10-30
  • 通讯作者: 宋文燕
  • 作者简介:葛胤池,北京航空航天大学计算机学院,博士研究生,主要研究方向为数据挖掘、知识图谱、自然语言处理。
    本文中负责方法设计、实验和主要论文撰写。
    Ge Yinchi is a PhD student in School of computer Science and Engineering, Beihang University. His research areas are data mining, knowledge graph and NLP.
    In this paper, he is responsible for the method design, experiment conduction and main paper writing.
    E-mail:geyinchi@buaa.edu.cn|张辉,北京航空航天大学计算机学院,博士,教授,博士生导师,国家科技资源共享服务工程技术研究中心副主任,负责科技资源的整合、管理与共享服务的技术研发工作,发表相关学术论文70余篇,获得专利6项。目前主要研究领域为互联网信息检索、大数据管理与挖掘、知识发现与管理。
    本文中对文章总体框架进行指导。
    Zhang Hui is a professor and the doctoral tutor at the School of Computer Science and Engineering of Beihang University. He also serves as the deputy director of the National Science and Technology Resource Sharing Service Engineering Research Center. His main research areas are Internet information retrieval, big data management and mining, knowledge discovery and management.
    In this paper, he is responsible for the overall framework guidance of the article.
    E-mail: hzhang@buaa.edu.cn|宋文燕,北航长聘副教授、博士生导师,上海交通大学工学博士,德国慕尼黑工业大学(Technische Universität München)博士后。长期从事复杂产品/服务系统、大规模个性化定制、模块化协同开发、可持续运营等理论及应用研究。已发表学术论文近60篇,其中50多篇发表在IEEE T. Reliab., Int. J. Prod. Res., CIRP Ann.- Manuf. Techn.等国际SCI/SSCI期刊上,在国际知名出版社Springer独立出版英文学术专著1部,在机械工业出版社等出版著作、教材2部,授权/公开国家发明专利多项,入选“北航青年拔尖人才支持计划”。
    本文中对文章整体框架进行指导。
    Song Wenyan, Ph.D., is an associate professor and doctoral tutor of Beihang University. His research areas are engaged in theoretical and applied research on complex products/service systems, large-scale personalized customization, modular collaborative development, sustainable operation, and etc.
    In this paper, he is responsible for the overall framework guidance of the article.
    E-mail: songwenyan@buaa.edu.cn|王轩,北京航空航天大学计算机学院,硕士研究生,主要研究方向为知识图谱,自然语言处理,大数据分析。
    本文中完成研究背景部分论文撰写。
    Wang Xuan is a graduate student in School of computer Science and Engineering, Beihang University. His research areas are knowledge graph, natural language processing, and big data analysis.
    In this paper, he is responsible for the background research of the paper.
    E-mail: 837909408@qq.com
  • 基金资助:
    国家重点研发计划“分布式科技资源体系及服务评价技术研究”(2017YFB1400200);国家自然科学基金面上项目(71971012);国家科技重大专项(2017-Ⅰ-0011-0012)

Scientific and Technology Resource Clustering Based on Domain Ontology

Ge Yinchi1(),Zhang Hui1(),Song Wenyan2,*(),Wang Xuan1()   

  1. 1. Beihang University, School of computer Science and Engineering, Beijing 100191, China
    2. Beihang University, School of economics and management, Beijing 100191, China
  • Received:2020-07-17 Online:2020-10-20 Published:2020-10-30
  • Contact: Song Wenyan

摘要:

【目的】 针对科技资源分散、异构的特点,采用聚类的方法将分散、相关、相似的科技资源集成为多类型组合的资源池,以提高发现资源和利用资源的效率。本文提出一种基于领域本体的高维科技资源聚类方法。 【方法】 本方法构建了科技资源领域本体树和概念语义关系矩阵,并对其使用主成分分析(PCA)方法进行降维处理以构建科技资源向量空间,最终对科技资源向量空间应用K均值聚类算法得到聚类结果。与传统方法相比,本方法更适合于处理多源异构的科技资源数据。 【结果】 选取某国家生物种质资源库的资源数据作为科技资源集合,利用本方法得到了合理的聚类结果。 【结论】 本文提出的科技资源聚类方法具有三个特点:一是利用本体概念语义关系降维处理,有效降低了计算复杂度;二是较好地保留了重要的科技资源特征信息;三是生成的科技资源向量空间与聚类簇比较准确。本方法在一定程度上解决了多源异构科技资源数据的特征表示难、聚类效果差等问题。

关键词: 科技资源, 异构, 领域本体, 语义关系, 聚类

Abstract:

[Objective] Clustering can gather scattered, heterogeneous but related, similar scientific and technology resources into a multi-type resource pool, which makes resource discovery and utilization more efficient. This paper proposes a clustering method for massive high-dimensional scientific and technology resources based on domain ontology. [Methods] This method constructs the ontology tree and concept semantic relationship matrix in the field of science and technology resources, and uses the Principal Component Analysis (PCA) method to reduce the dimensions to construct the vector space, to which the K-means clustering method is applied eventually to obtain the clustering result. Compared with the traditional methods, this method has a stronger processing capacity for multi-source heterogeneous scientific and technology resource data. [Results] In this paper, the rational clustering results can be obtained by the proposed method on a certain biological germplasm resource library test. [Conclusions] In general, the clustering method of scientific and technology resources proposed in this paper has three characteristics: first, the use of ontology concept semantic relations to reduce the dimensionality, which effectively reduces the computational complexity; second, better maintenance of important scientific and technology resource feature information; and third, more accurate resource vector space and clustering results. The proposed clustering method solves the difficult problems in feature representation and poor clustering effect of multi-source heterogeneous scientific and technology resource data to a certain extent.

Key words: scientific and technology resources, heterogeneity, domain ontology, semantic relationship, clustering